Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.
Arata KAWAMURA
Osaka University
Hiro IGARASHI
Osaka University
Youji IIGUNI
Osaka University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Arata KAWAMURA, Hiro IGARASHI, Youji IIGUNI, "An Efficient Image to Sound Mapping Method Using Speech Spectral Phase and Multi-Column Image" in IEICE TRANSACTIONS on Fundamentals,
vol. E100-A, no. 3, pp. 893-895, March 2017, doi: 10.1587/transfun.E100.A.893.
Abstract: Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1587/transfun.E100.A.893/_p
Copy
@ARTICLE{e100-a_3_893,
author={Arata KAWAMURA, Hiro IGARASHI, Youji IIGUNI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={An Efficient Image to Sound Mapping Method Using Speech Spectral Phase and Multi-Column Image},
year={2017},
volume={E100-A},
number={3},
pages={893-895},
abstract={Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.},
keywords={},
doi={10.1587/transfun.E100.A.893},
ISSN={1745-1337},
month={March},}
Copy
TY - JOUR
TI - An Efficient Image to Sound Mapping Method Using Speech Spectral Phase and Multi-Column Image
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 893
EP - 895
AU - Arata KAWAMURA
AU - Hiro IGARASHI
AU - Youji IIGUNI
PY - 2017
DO - 10.1587/transfun.E100.A.893
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E100-A
IS - 3
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - March 2017
AB - Image-to-sound mapping is a technique that transforms an image to a sound signal, which is subsequently treated as a sound spectrogram. In general, the transformed sound differs from a human speech signal. Herein an efficient image-to-sound mapping method, which provides an understandable speech signal without any training, is proposed. To synthesize such a speech signal, the proposed method utilizes a multi-column image and a speech spectral phase that is obtained from a long-time observation of the speech. The original image can be retrieved from the sound spectrogram of the synthesized speech signal. The synthesized speech and the reconstructed image qualities are evaluated using objective tests.
ER -