Noise Robust Speech Recognition Using <I>F</I><SUB>0</SUB> Contour Information

Koji IWANO; Takahiro SEKI; Sadaoki FURUI

Noise Robust Speech Recognition Using F₀ Contour Information

Koji IWANO, Takahiro SEKI, Sadaoki FURUI

Full Text Views

0

Share
Cite this

Summary :

This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, the fundamental frequency (F₀) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrases and word boundaries. This paper first describes a noise robust F₀ extraction method using the Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using multi-stream HMMs which model both segmental spectral and F₀ contour information. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition error rate is reduced in all noise conditions, and the best absolute improvement of digit accuracy is about 4.5%. This improvement is achieved by robust digit boundary detection using the prosodic information.

Publication: IEICE TRANSACTIONS on Information Vol.E87-D No.5 pp.1102-1109

Publication Date: 2004/05/01

Publicized

Online ISSN

DOI

Type of Manuscript: Special Section PAPER (Special Section on Speech Dynamics by Ear, Eye, Mouth and Machine)

Category

Cite this

Copy

Koji IWANO, Takahiro SEKI, Sadaoki FURUI, "Noise Robust Speech Recognition Using F0 Contour Information" in IEICE TRANSACTIONS on Information, vol. E87-D, no. 5, pp. 1102-1109, May 2004, doi: .
Abstract: This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, the fundamental frequency (F₀) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrases and word boundaries. This paper first describes a noise robust F₀ extraction method using the Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using multi-stream HMMs which model both segmental spectral and F₀ contour information. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition error rate is reduced in all noise conditions, and the best absolute improvement of digit accuracy is about 4.5%. This improvement is achieved by robust digit boundary detection using the prosodic information.
URL: https://globals.ieice.org/en_transactions/information/10.1587/e87-d_5_1102/_p

Copy

@ARTICLE{e87-d_5_1102,
author={Koji IWANO, Takahiro SEKI, Sadaoki FURUI, },
journal={IEICE TRANSACTIONS on Information},
title={Noise Robust Speech Recognition Using F0 Contour Information},
year={2004},
volume={E87-D},
number={5},
pages={1102-1109},
abstract={This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, the fundamental frequency (F₀) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrases and word boundaries. This paper first describes a noise robust F₀ extraction method using the Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using multi-stream HMMs which model both segmental spectral and F₀ contour information. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition error rate is reduced in all noise conditions, and the best absolute improvement of digit accuracy is about 4.5%. This improvement is achieved by robust digit boundary detection using the prosodic information.},
keywords={},
doi={},
ISSN={},
month={May},}

Copy

TY - JOUR
TI - Noise Robust Speech Recognition Using F0 Contour Information
T2 - IEICE TRANSACTIONS on Information
SP - 1102
EP - 1109
AU - Koji IWANO
AU - Takahiro SEKI
AU - Sadaoki FURUI
PY - 2004
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E87-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2004
AB - This paper proposes a noise robust speech recognition method using prosodic information. In Japanese, the fundamental frequency (F₀) contour represents phrase intonation and word accent information. Consequently, it conveys information about prosodic phrases and word boundaries. This paper first describes a noise robust F₀ extraction method using the Hough transform, which achieves high extraction rates under various noise environments. Then it proposes a robust speech recognition method using multi-stream HMMs which model both segmental spectral and F₀ contour information. Speaker-independent experiments are conducted using connected digits uttered by 11 male speakers in various kinds of noise and SNR conditions. The recognition error rate is reduced in all noise conditions, and the best absolute improvement of digit accuracy is about 4.5%. This improvement is achieved by robust digit boundary detection using the prosodic information.
ER -