Author Search Result

[Author] Naoto IWAHASHI(5hit)

1-5hit
  • Discriminating Unknown Objects from Known Objects Using Image and Speech Information

    Yuko OZASA  Mikio NAKANO  Yasuo ARIKI  Naoto IWAHASHI  

     
    PAPER-Multimedia Pattern Processing

      Pubricized:
    2014/12/16
      Vol:
    E98-D No:3
      Page(s):
    704-711

    This paper deals with a problem where a robot identifies an object that a human asks it to bring by voice when there is a set of objects that the human and the robot can see. When the robot knows the requested object, it must identify the object and when it does not know the object, it must say it does not. This paper presents a new method for discriminating unknown objects from known objects using object images and human speech. It uses a confidence measure that integrates image recognition confidences and speech recognition confidences based on logistic regression.

  • Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization

    Naoto IWAHASHI  Nobuyoshi KAIKI  Yoshinori SAGISAKA  

     
    PAPER

      Vol:
    E76-A No:11
      Page(s):
    1942-1948

    This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypicality of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also descrided. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.

  • Statistical Modelling of Speech Segment Duration by Constrained Tree Regression

    Naoto IWAHASHI  Yoshinori SAGISAKA  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:7
      Page(s):
    1550-1559

    This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.

  • Interactive Learning of Spoken Words and Their Meanings Through an Audio-Visual Interface

    Naoto IWAHASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E91-D No:2
      Page(s):
    312-321

    This paper presents a new interactive learning method for spoken word acquisition through human-machine audio-visual interfaces. During the course of learning, the machine makes a decision about whether an orally input word is a word in the lexicon the machine has learned, using both speech and visual cues. Learning is carried out on-line, incrementally, based on a combination of active and unsupervised learning principles. If the machine judges with a high degree of confidence that its decision is correct, it learns the statistical models of the word and a corresponding image category as its meaning in an unsupervised way. Otherwise, it asks the user a question in an active way. The function used to estimate the degree of confidence is also learned adaptively on-line. Experimental results show that the combination of active and unsupervised learning principles enables the machine and the user to adapt to each other, which makes the learning process more efficient.

  • Training Method for Pattern Classifier Based on the Performance after Adaptation

    Naoto IWAHASHI  

     
    PAPER-Speech and Hearing

      Vol:
    E83-D No:7
      Page(s):
    1560-1566

    This paper describes a method for training a pattern classifier that will perform well after it has been adapted to changes in input conditions. Considering the adaptation methods which are based on the transformation of classifier parameters, we formulate the problem of optimizing classifiers, and propose a method for training them. In the proposed training method, the classifier is trained while the adaptation is being carried out. The objective function for the training is given based on the recognition performance obtained by the adapted classifier. The utility of the proposed training method is demonstrated by experiments in a five-class Japanese vowel pattern recognition task with speaker adaptation.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.