Posteriori Restoration of Turn-Taking and ASR Results for Incorrectly Segmented Utterances

Kazunori KOMATANI, Naoki HOTTA, Satoshi SATO, Mikio NAKANO

  • Full Text Views

    0

  • Cite this

Summary :

Appropriate turn-taking is important in spoken dialogue systems as well as generating correct responses. Especially if the dialogue features quick responses, a user utterance is often incorrectly segmented due to short pauses within it by voice activity detection (VAD). Incorrectly segmented utterances cause problems both in the automatic speech recognition (ASR) results and turn-taking: i.e., an incorrect VAD result leads to ASR errors and causes the system to start responding though the user is still speaking. We develop a method that performs a posteriori restoration for incorrectly segmented utterances and implement it as a plug-in for the MMDAgent open-source software. A crucial part of the method is to classify whether the restoration is required or not. We cast it as a binary classification problem of detecting originally single utterances from pairs of utterance fragments. Various features are used representing timing, prosody, and ASR result information. Experiments show that the proposed method outperformed a baseline with manually-selected features by 4.8% and 3.9% in cross-domain evaluations with two domains. More detailed analysis revealed that the dominant and domain-independent features were utterance intervals and results from the Gaussian mixture model (GMM).

Publication
IEICE TRANSACTIONS on Information Vol.E98-D No.11 pp.1923-1931
Publication Date
2015/11/01
Publicized
2015/07/24
Online ISSN
1745-1361
DOI
10.1587/transinf.2015EDP7014
Type of Manuscript
PAPER
Category
Speech and Hearing

Authors

Kazunori KOMATANI
  Osaka University
Naoki HOTTA
  Nagoya University
Satoshi SATO
  Nagoya University
Mikio NAKANO
  Honda Research Institute Japan, Co., Ltd.

Keyword

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.