Multimodal Learning of Geometry-Preserving Binary Codes for Semantic Image Retrieval

Go IRIE; Hiroyuki ARAI; Yukinobu TANIGUCHI

doi:10.1587/transinf.2016AWI0003

Open Access
Multimodal Learning of Geometry-Preserving Binary Codes for Semantic Image Retrieval

Go IRIE, Hiroyuki ARAI, Yukinobu TANIGUCHI

Full Text Views

113

Share
Cite this

Free PDF (572.9KB)

Summary :

This paper presents an unsupervised approach to feature binary coding for efficient semantic image retrieval. Although the majority of the existing methods aim to preserve neighborhood structures of the feature space, semantically similar images are not always in such neighbors but are rather distributed in non-linear low-dimensional manifolds. Moreover, images are rarely alone on the Internet and are often surrounded by text data such as tags, attributes, and captions, which tend to carry rich semantic information about the images. On the basis of these observations, the approach presented in this paper aims at learning binary codes for semantic image retrieval using multimodal information sources while preserving the essential low-dimensional structures of the data distributions in the Hamming space. Specifically, after finding the low-dimensional structures of the data by using an unsupervised sparse coding technique, our approach learns a set of linear projections for binary coding by solving an optimization problem which is designed to jointly preserve the extracted data structures and multimodal data correlations between images and texts in the Hamming space as much as possible. We show that the joint optimization problem can readily be transformed into a generalized eigenproblem that can be efficiently solved. Extensive experiments demonstrate that our method yields significant performance gains over several existing methods.

Publication: IEICE TRANSACTIONS on Information Vol.E100-D No.4 pp.600-609

Publication Date: 2017/04/01

Publicized: 2017/01/06

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2016AWI0003

Type of Manuscript: Special Section INVITED PAPER (Special Section on Award-winning Papers)

Category

Authors

Go IRIE
  Nippon Telegraph & Telephone Corporation
Hiroyuki ARAI
  Nippon Telegraph & Telephone Corporation
Yukinobu TANIGUCHI
  Tokyo University of Science

Keyword

image retrieval, multimodal learning, binary coding

Cite this

Copy

Go IRIE, Hiroyuki ARAI, Yukinobu TANIGUCHI, "Multimodal Learning of Geometry-Preserving Binary Codes for Semantic Image Retrieval" in IEICE TRANSACTIONS on Information, vol. E100-D, no. 4, pp. 600-609, April 2017, doi: 10.1587/transinf.2016AWI0003.
Abstract: This paper presents an unsupervised approach to feature binary coding for efficient semantic image retrieval. Although the majority of the existing methods aim to preserve neighborhood structures of the feature space, semantically similar images are not always in such neighbors but are rather distributed in non-linear low-dimensional manifolds. Moreover, images are rarely alone on the Internet and are often surrounded by text data such as tags, attributes, and captions, which tend to carry rich semantic information about the images. On the basis of these observations, the approach presented in this paper aims at learning binary codes for semantic image retrieval using multimodal information sources while preserving the essential low-dimensional structures of the data distributions in the Hamming space. Specifically, after finding the low-dimensional structures of the data by using an unsupervised sparse coding technique, our approach learns a set of linear projections for binary coding by solving an optimization problem which is designed to jointly preserve the extracted data structures and multimodal data correlations between images and texts in the Hamming space as much as possible. We show that the joint optimization problem can readily be transformed into a generalized eigenproblem that can be efficiently solved. Extensive experiments demonstrate that our method yields significant performance gains over several existing methods.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2016AWI0003/_p

Copy

@ARTICLE{e100-d_4_600,
author={Go IRIE, Hiroyuki ARAI, Yukinobu TANIGUCHI, },
journal={IEICE TRANSACTIONS on Information},
title={Multimodal Learning of Geometry-Preserving Binary Codes for Semantic Image Retrieval},
year={2017},
volume={E100-D},
number={4},
pages={600-609},
abstract={This paper presents an unsupervised approach to feature binary coding for efficient semantic image retrieval. Although the majority of the existing methods aim to preserve neighborhood structures of the feature space, semantically similar images are not always in such neighbors but are rather distributed in non-linear low-dimensional manifolds. Moreover, images are rarely alone on the Internet and are often surrounded by text data such as tags, attributes, and captions, which tend to carry rich semantic information about the images. On the basis of these observations, the approach presented in this paper aims at learning binary codes for semantic image retrieval using multimodal information sources while preserving the essential low-dimensional structures of the data distributions in the Hamming space. Specifically, after finding the low-dimensional structures of the data by using an unsupervised sparse coding technique, our approach learns a set of linear projections for binary coding by solving an optimization problem which is designed to jointly preserve the extracted data structures and multimodal data correlations between images and texts in the Hamming space as much as possible. We show that the joint optimization problem can readily be transformed into a generalized eigenproblem that can be efficiently solved. Extensive experiments demonstrate that our method yields significant performance gains over several existing methods.},
keywords={},
doi={10.1587/transinf.2016AWI0003},
ISSN={1745-1361},
month={April},}

Copy

TY - JOUR
TI - Multimodal Learning of Geometry-Preserving Binary Codes for Semantic Image Retrieval
T2 - IEICE TRANSACTIONS on Information
SP - 600
EP - 609
AU - Go IRIE
AU - Hiroyuki ARAI
AU - Yukinobu TANIGUCHI
PY - 2017
DO - 10.1587/transinf.2016AWI0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E100-D
IS - 4
JA - IEICE TRANSACTIONS on Information
Y1 - April 2017
AB - This paper presents an unsupervised approach to feature binary coding for efficient semantic image retrieval. Although the majority of the existing methods aim to preserve neighborhood structures of the feature space, semantically similar images are not always in such neighbors but are rather distributed in non-linear low-dimensional manifolds. Moreover, images are rarely alone on the Internet and are often surrounded by text data such as tags, attributes, and captions, which tend to carry rich semantic information about the images. On the basis of these observations, the approach presented in this paper aims at learning binary codes for semantic image retrieval using multimodal information sources while preserving the essential low-dimensional structures of the data distributions in the Hamming space. Specifically, after finding the low-dimensional structures of the data by using an unsupervised sparse coding technique, our approach learns a set of linear projections for binary coding by solving an optimization problem which is designed to jointly preserve the extracted data structures and multimodal data correlations between images and texts in the Hamming space as much as possible. We show that the joint optimization problem can readily be transformed into a generalized eigenproblem that can be efficiently solved. Extensive experiments demonstrate that our method yields significant performance gains over several existing methods.
ER -