A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Takafumi KOSHINAKA, Kentaro NAGATOMO, Koichi SHINODA, "Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model" in IEICE TRANSACTIONS on Information,
vol. E95-D, no. 10, pp. 2469-2478, October 2012, doi: 10.1587/transinf.E95.D.2469.
Abstract: A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.E95.D.2469/_p
Copy
@ARTICLE{e95-d_10_2469,
author={Takafumi KOSHINAKA, Kentaro NAGATOMO, Koichi SHINODA, },
journal={IEICE TRANSACTIONS on Information},
title={Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model},
year={2012},
volume={E95-D},
number={10},
pages={2469-2478},
abstract={A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.},
keywords={},
doi={10.1587/transinf.E95.D.2469},
ISSN={1745-1361},
month={October},}
Copy
TY - JOUR
TI - Online Speaker Clustering Using Incremental Learning of an Ergodic Hidden Markov Model
T2 - IEICE TRANSACTIONS on Information
SP - 2469
EP - 2478
AU - Takafumi KOSHINAKA
AU - Kentaro NAGATOMO
AU - Koichi SHINODA
PY - 2012
DO - 10.1587/transinf.E95.D.2469
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E95-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2012
AB - A novel online speaker clustering method based on a generative model is proposed. It employs an incremental variant of variational Bayesian learning and provides probabilistic (non-deterministic) decisions for each input utterance, on the basis of the history of preceding utterances. It can be expected to be robust against errors in cluster estimation and the classification of utterances, and hence to be applicable to many real-time applications. Experimental results show that it produces 50% fewer classification errors than does a conventional online method. They also show that it is possible to reduce the number of speech recognition errors by combining the method with unsupervised speaker adaptation.
ER -