In this letter, high quality speech reconstruction approaches from Mel-frequency cepstral coefficients (MFCC) are presented. Taking into account of the nonnegative and sparse properties of the speech power spectrum, an alternating direction method of multipliers (ADMM) based nonnegative l2 norm (NL2) and weighted nonnegative l2 norm (NWL2) minimization approach is proposed to cope with the under-determined nature of the reconstruction problem. The phase spectrum is recovered by the well-known LSE-ISTFTM algorithm. Experimental results demonstrate that the NL2 and NWL2 approach substantially achieves better quality for reconstructed speech than the conventional l2 norm minimization approach, it sounds very close to the original speech when using the high-resolution MFCC, the PESQ score reaches 4.0.
Gang MIN
PLA University of Science and Technology
Xiong wei ZHANG
PLA University of Science and Technology
Ji bin YANG
PLA University of Science and Technology
Xia ZOU
PLA University of Science and Technology
Zhi song PAN
PLA University of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Gang MIN, Xiong wei ZHANG, Ji bin YANG, Xia ZOU, Zhi song PAN, "Speech Reconstruction from MFCC Based on Nonnegative and Sparse Priors" in IEICE TRANSACTIONS on Fundamentals,
vol. E98-A, no. 7, pp. 1540-1543, July 2015, doi: 10.1587/transfun.E98.A.1540.
Abstract: In this letter, high quality speech reconstruction approaches from Mel-frequency cepstral coefficients (MFCC) are presented. Taking into account of the nonnegative and sparse properties of the speech power spectrum, an alternating direction method of multipliers (ADMM) based nonnegative l2 norm (NL2) and weighted nonnegative l2 norm (NWL2) minimization approach is proposed to cope with the under-determined nature of the reconstruction problem. The phase spectrum is recovered by the well-known LSE-ISTFTM algorithm. Experimental results demonstrate that the NL2 and NWL2 approach substantially achieves better quality for reconstructed speech than the conventional l2 norm minimization approach, it sounds very close to the original speech when using the high-resolution MFCC, the PESQ score reaches 4.0.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1587/transfun.E98.A.1540/_p
Copy
@ARTICLE{e98-a_7_1540,
author={Gang MIN, Xiong wei ZHANG, Ji bin YANG, Xia ZOU, Zhi song PAN, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Speech Reconstruction from MFCC Based on Nonnegative and Sparse Priors},
year={2015},
volume={E98-A},
number={7},
pages={1540-1543},
abstract={In this letter, high quality speech reconstruction approaches from Mel-frequency cepstral coefficients (MFCC) are presented. Taking into account of the nonnegative and sparse properties of the speech power spectrum, an alternating direction method of multipliers (ADMM) based nonnegative l2 norm (NL2) and weighted nonnegative l2 norm (NWL2) minimization approach is proposed to cope with the under-determined nature of the reconstruction problem. The phase spectrum is recovered by the well-known LSE-ISTFTM algorithm. Experimental results demonstrate that the NL2 and NWL2 approach substantially achieves better quality for reconstructed speech than the conventional l2 norm minimization approach, it sounds very close to the original speech when using the high-resolution MFCC, the PESQ score reaches 4.0.},
keywords={},
doi={10.1587/transfun.E98.A.1540},
ISSN={1745-1337},
month={July},}
Copy
TY - JOUR
TI - Speech Reconstruction from MFCC Based on Nonnegative and Sparse Priors
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 1540
EP - 1543
AU - Gang MIN
AU - Xiong wei ZHANG
AU - Ji bin YANG
AU - Xia ZOU
AU - Zhi song PAN
PY - 2015
DO - 10.1587/transfun.E98.A.1540
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E98-A
IS - 7
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - July 2015
AB - In this letter, high quality speech reconstruction approaches from Mel-frequency cepstral coefficients (MFCC) are presented. Taking into account of the nonnegative and sparse properties of the speech power spectrum, an alternating direction method of multipliers (ADMM) based nonnegative l2 norm (NL2) and weighted nonnegative l2 norm (NWL2) minimization approach is proposed to cope with the under-determined nature of the reconstruction problem. The phase spectrum is recovered by the well-known LSE-ISTFTM algorithm. Experimental results demonstrate that the NL2 and NWL2 approach substantially achieves better quality for reconstructed speech than the conventional l2 norm minimization approach, it sounds very close to the original speech when using the high-resolution MFCC, the PESQ score reaches 4.0.
ER -