Author Search Result

[Author] Hideki KASUYA(8hit)

1-8hit
  • Automatic Detection of Vowel Centers from Continuous Speech

    Hideki KASUYA  Hisashi WAKITA  

     
    PAPER-Acoustics

      Vol:
    E64-E No:10
      Page(s):
    640-645

    A speaker independent algorithm is given which automatically detects the most steady-state portion of a vowel (vowel center) from continuous speech. The algorithm first extracts the segments each of which contains a vowel and, if present, pre- and/or post-vocalic liquids and semivowels, and then locates the most steady-state portion of the segment. An advantage of the algorithm is its ability to distinguish the nasal and the intervocalic liquid and semivowel segments without relying upon the formant frequencies which have been used in most of the previous work of vowel segment detection procedure. This results in a computationally simple algorithm. The test on ten sentences spoken by each of two males and two females resulted in score of 93.2% correct vowel center localization.

  • Perceptual Contributions of Static and Dynamic Features of Vocal Tract Characteristics to Talker Individuality

    Weizhong ZHU  Hideki KASUYA  

     
    PAPER-Acoustics

      Vol:
    E81-A No:2
      Page(s):
    268-274

    Experiments were performed to investigate perceptual contributions of static and dynamic features of vocal tract characteristics to talker individuality. An ARX (Auto-regressive with exogenous input) speech production model was used to extract separately voice source and vocal tract parameters from a Japanese sentence, /aoiueoie/ ("Say blue top" in English) uttered by three males. The Discrete Cosine Transform (DCT) was applied to resolve formant trajectories of the speech signal into static and dynamic components. The perceptual contributions were quantitatively studied by systematically replacing the corresponding formant components of the sentences between the three talkers. Results of the experiments show that the static (average) feature of the vocal tract is a primary cue to talker individuality.

  • F0 Dynamics in Singing: Evidence from the Data of a Baritone Singer

    Hiroki MORI  Wakana ODAGIRI  Hideki KASUYA  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1086-1092

    Transitional fundamental frequency (F0) characteristics comprise a crucial part of F0 dynamics in singing. This paper examines the F0 characteristics during the note transition period. An analysis of the singing voice of a professional baritone strongly suggests that asymmetries exist in the mechanisms used for controlling rising and falling. Specifically, the F0 contour in rising transitions can be modeled as a step response from a critically-damped second-order linear system with fixed average/maximum speed of change, whereas that in falling transitions can be modeled as a step response from an underdamped second-order linear system with fixed transition time. The validity of the model is examined through auditory experiments using synthesized singing voice.

  • Significance of Suitability Assessment in Speech Synthesis Applications

    Hideki KASUYA  

     
    INVITED PAPER

      Vol:
    E76-A No:11
      Page(s):
    1893-1897

    The paper indicates the importance of suitability assesment in speech synthesis applications. Human factors involved in the use of a synthetic speech are first discussed on the basis of an example of a newspaper company where synthetic speech is extensively used as an aid for proofreading a manuscript. Some findings obtained from perceptual experiments on the subjects' preference for paralinguistic properties of synthetic speech are then described, focusing primarily on the suitability of pitch characteristics, speaker's gender, and speaking rates in the task where subjects are asked to proofread a printed text while listening to the speech. The paper finally claims the need for a flexibile speech synthesis system which helps the users create their own synthetic speech.

  • An Integrated Voice Analyzer for Acoustic Evaluation of Pathological Voice

    Yoshinobu KIKUCHI  Satoshi UCHIDA  Hideki KASUYA  

     
    LETTER-Acoustics

      Vol:
    E69-E No:10
      Page(s):
    1057-1059

    In order to achieve high speed measurements of acoustic parameters needed for evaluating pathological voice, an integrated voice analyzer (IVA) has been developed by using a digital signal processor and a general purpose microprocessor. By utilizing a personal computer as a controller of the IVA, a versatile system can be constructed for the acoustic evaluation of pathological voice.

  • An Improved Algorithm of Autocorrelation Pitch Detection

    Xueming GAO  Yoshinobu KIKUCHI  Hideki KASUYA  

     
    LETTER-Acoustics

      Vol:
    E67-E No:5
      Page(s):
    291-292

    An improved algorithm of autocorrelation pitch detection is presented. Preliminary experiments show that the algorithm can considerably reduce the errors caused by the ordinary autocorrelation pitch detector.

  • Simultaneous Estimation of Vocal Tract and Voice Source Parameters Based on an ARX Model

    Wen DING  Hideki KASUYA  Shuichi ADACHI  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    738-743

    A novel adaptive pitch-synchronous analysis method is proposed to estimate simultaneously vocal tract (formant/antiformant) and voice source parameters from speech waveforms. We use the parametric Rosenberg-Klatt (RK) model to generate a glottal waveform and an autoregressive-exogenous (ARX) model to represent voiced speech production process. The Kalman filter algorithm is used to estimate the formant/antiformant parameters from the coefficient of the ARX model, and the simulated annealing method is employed as a nonlinear optimization approach to estimate the voice source parameters. The two approaches work together in a system identification procedure to find the best set of the parameters of both the models. The new method has been compared using synthetic speech with some other approaches in terms of accuracy of estimated parameter values and has been proved to be superior. We also show that the proposed method can estimate accurately the parameters from natural speech sounds. A major application of the analysis method lies in a concatenative formant synthesizer which allows us to make flexible control of voice quality of synthetic speech.

  • Uniform and Non-uniform Normalization of Vocal Tracts Measured by MRI Across Male, Female and Child Subjects

    Chang-Sheng YANG  Hideki KASUYA  

     
    PAPER

      Vol:
    E78-D No:6
      Page(s):
    732-737

    Three-dimensional vocal tract shapes of a male, a female and a child subjects are measured from magnetic resonance (MR) images during sustained phonation of Japanese vowels /a, i, u, e, o/. Non-uniform dimensional differences in the vocal tract shapes of the subjects are quantitatively measured. Vocal tract area functions of the female and child subjects are normalized to those of the male on the basis of non-uniform and uniform scalings of the vocal tract length and compared with each other. A comparison is also made between the formant frequencies computed from the area functions normalized by the two different scalings. It is suggested by the comparisons that non-uniformity in the vocal tract dimensions is not essential in the normalization of the five Japanese vowels.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.