Author Search Result

[Author] Ki-Seung LEE(6hit)

1-6hit
  • Robust Recognition of Fast Speech

    Ki-Seung LEE  

     
    LETTER-Speech and Hearing

      Vol:
    E89-D No:8
      Page(s):
    2456-2459

    This letter describes a robust speech recognition system for recognizing fast speech by stretching the length of the utterance in the cepstrum domain. The degree of stretching for an utterance is determined by its rate of speech (ROS), which is based on a maximum likelihood (ML) criterion. The proposed method was evaluated on 10-digits mobile phone numbers. The results of the simulation show that the overall error rate was reduced by 17.8% when the proposed method was employed.

  • Voice Conversion Using Low Dimensional Vector Mapping

    Ki-Seung LEE  Won DOH  Dae-Hee YOUN  

     
    PAPER-Speech and Hearing

      Vol:
    E85-D No:8
      Page(s):
    1297-1305

    In this paper, a new voice personality transformation algorithm which uses the vocal tract characteristics and pitch period as feature parameters is proposed. The vocal tract transfer function is divided into time-invariant and time-varying parts. Conversion rules for the time-varying part are constructed by the classified-linear transformation matrix based on soft-clustering techniques for LPC cepstrum expressed in KL (Karhunen-Loève) coefficients. An excitation signal containing prosodic information is transformed by average pitch ratio. In order to improve the naturalness, transformation on the excitation signal is separately applied to voiced and unvoiced bands to preserve the overall spectral structure. Objective tests show that the distance between the LPC cepstrum of a target speaker and that of the speech synthesized using the proposed method is reduced by about 70% compared with the distance between the target speaker's LPC cepstrum and the source speaker's. Also, subjective listening tests show that 60-70% of listeners identify the transformed speech as the target speaker's.

  • HMM-Based Maximum Likelihood Frame Alignment for Voice Conversion from a Nonparallel Corpus

    Ki-Seung LEE  

     
    LETTER-Speech and Hearing

      Pubricized:
    2017/08/23
      Vol:
    E100-D No:12
      Page(s):
    3064-3067

    One of the problems associated with voice conversion from a nonparallel corpus is how to find the best match or alignment between the source and the target vector sequences without linguistic information. In a previous study, alignment was achieved by minimizing the distance between the source vector and the transformed vector. This method, however, yielded a sequence of feature vectors that were not well matched with the underlying speaker model. In this letter, the vectors were selected from the candidates by maximizing the overall likelihood of the selected vectors with respect to the target model in the HMM context. Both objective and subjective evaluations were carried out using the CMU ARCTIC database to verify the effectiveness of the proposed method.

  • Compensation for Shot-to-Shot Variations in Laser Pulse Energy for Photoacoustic Imaging

    Ki-Seung LEE  

     
    BRIEF PAPER-Optoelectronics

      Vol:
    E100-C No:11
      Page(s):
    1069-1072

    In photoacoustic imaging, laser power variation is one of the major factors in the degradation of the quality of reproduced images. A simple, but efficient method of compensating for the variations in laser pulse energy is proposed here where the characteristics of the adopted optical sensor and acoustic sensor were estimated in order to minimize the average local variation in optically homogeneous regions. Phantom experiments were carried out to validate the effectiveness of the proposed method.

  • Silent Speech Interface Using Ultrasonic Doppler Sonar

    Ki-Seung LEE  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/05/20
      Vol:
    E103-D No:8
      Page(s):
    1875-1887

    Some non-acoustic modalities have the ability to reveal certain speech attributes that can be used for synthesizing speech signals without acoustic signals. This study validated the use of ultrasonic Doppler frequency shifts caused by facial movements to implement a silent speech interface system. A 40kHz ultrasonic beam is incident to a speaker's mouth region. The features derived from the demodulated received signals were used to estimate the speech parameters. A nonlinear regression approach was employed in this estimation where the relationship between ultrasonic features and corresponding speech is represented by deep neural networks (DNN). In this study, we investigated the discrepancies between the ultrasonic signals of audible and silent speech to validate the possibility for totally silent communication. Since reference speech signals are not available in silently mouthed ultrasonic signals, a nearest-neighbor search and alignment method was proposed, wherein alignment was achieved by determining the optimal pair of ultrasonic and audible features in the sense of a minimum mean square error criterion. The experimental results showed that the performance of the ultrasonic Doppler-based method was superior to that of EMG-based speech estimation, and was comparable to an image-based method.

  • Nonlinear Long-Term Prediction of Speech Signal

    Ki-Seung LEE  

     
    LETTER-Speech and Hearing

      Vol:
    E85-D No:8
      Page(s):
    1346-1348

    This letter addresses a neural network (NN)-based predictor for the LP (Linear Prediction) residual. A new NN predictor takes into consideration not only prediction error but also quantization effects. To increase robustness against the quantization noise of the nonlinear prediction residual, a constrained back propagation learning algorithm, which satisfies a Kuhn-Tucker inequality condition is proposed. Preliminary results indicate that the prediction gain of the proposed NN predictor was not seriously decreased even when the constrained optimization algorithm was employed.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.