Stochastic Divergence Minimization for Biterm Topic Models

Zhenghang CUI; Issei SATO; Masashi SUGIYAMA

doi:10.1587/transinf.2017EDP7310

Stochastic Divergence Minimization for Biterm Topic Models

Zhenghang CUI, Issei SATO, Masashi SUGIYAMA

Full Text Views

0

Share
Cite this

Summary :

As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed. Inferring latent topics of collected short texts is an essential task for understanding its hidden structure and predicting new contents. A biterm topic model (BTM) was recently proposed for short texts to overcome the sparseness of document-level word co-occurrences by directly modeling the generation process of word pairs. Stochastic inference algorithms based on collapsed Gibbs sampling (CGS) and collapsed variational inference have been proposed for BTM. However, they either require large computational complexity, or rely on very crude estimation that does not preserve sufficient statistics. In this work, we develop a stochastic divergence minimization (SDM) inference algorithm for BTM to achieve better predictive likelihood in a scalable way. Experiments show that SDM-BTM trained by 30% data outperforms the best existing algorithm trained by full data.

Publication: IEICE TRANSACTIONS on Information Vol.E101-D No.3 pp.668-677

Publication Date: 2018/03/01

Publicized: 2017/12/20

Online ISSN: 1745-1361

DOI: 10.1587/transinf.2017EDP7310

Type of Manuscript: PAPER

Category: Data Engineering, Web Information Systems

Authors

Zhenghang CUI
  University of Tokyo
Issei SATO
  University of Tokyo,RIKEN
Masashi SUGIYAMA
  University of Tokyo,RIKEN

Keyword

short text, topic model, biterm, stochastic inference algorithm

Cite this

Copy

Zhenghang CUI, Issei SATO, Masashi SUGIYAMA, "Stochastic Divergence Minimization for Biterm Topic Models" in IEICE TRANSACTIONS on Information, vol. E101-D, no. 3, pp. 668-677, March 2018, doi: 10.1587/transinf.2017EDP7310.
Abstract: As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed. Inferring latent topics of collected short texts is an essential task for understanding its hidden structure and predicting new contents. A biterm topic model (BTM) was recently proposed for short texts to overcome the sparseness of document-level word co-occurrences by directly modeling the generation process of word pairs. Stochastic inference algorithms based on collapsed Gibbs sampling (CGS) and collapsed variational inference have been proposed for BTM. However, they either require large computational complexity, or rely on very crude estimation that does not preserve sufficient statistics. In this work, we develop a stochastic divergence minimization (SDM) inference algorithm for BTM to achieve better predictive likelihood in a scalable way. Experiments show that SDM-BTM trained by 30% data outperforms the best existing algorithm trained by full data.
URL: https://globals.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7310/_p

Copy

@ARTICLE{e101-d_3_668,
author={Zhenghang CUI, Issei SATO, Masashi SUGIYAMA, },
journal={IEICE TRANSACTIONS on Information},
title={Stochastic Divergence Minimization for Biterm Topic Models},
year={2018},
volume={E101-D},
number={3},
pages={668-677},
abstract={As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed. Inferring latent topics of collected short texts is an essential task for understanding its hidden structure and predicting new contents. A biterm topic model (BTM) was recently proposed for short texts to overcome the sparseness of document-level word co-occurrences by directly modeling the generation process of word pairs. Stochastic inference algorithms based on collapsed Gibbs sampling (CGS) and collapsed variational inference have been proposed for BTM. However, they either require large computational complexity, or rely on very crude estimation that does not preserve sufficient statistics. In this work, we develop a stochastic divergence minimization (SDM) inference algorithm for BTM to achieve better predictive likelihood in a scalable way. Experiments show that SDM-BTM trained by 30% data outperforms the best existing algorithm trained by full data.},
keywords={},
doi={10.1587/transinf.2017EDP7310},
ISSN={1745-1361},
month={March},}

Copy

TY - JOUR
TI - Stochastic Divergence Minimization for Biterm Topic Models
T2 - IEICE TRANSACTIONS on Information
SP - 668
EP - 677
AU - Zhenghang CUI
AU - Issei SATO
AU - Masashi SUGIYAMA
PY - 2018
DO - 10.1587/transinf.2017EDP7310
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 3
JA - IEICE TRANSACTIONS on Information
Y1 - March 2018
AB - As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed. Inferring latent topics of collected short texts is an essential task for understanding its hidden structure and predicting new contents. A biterm topic model (BTM) was recently proposed for short texts to overcome the sparseness of document-level word co-occurrences by directly modeling the generation process of word pairs. Stochastic inference algorithms based on collapsed Gibbs sampling (CGS) and collapsed variational inference have been proposed for BTM. However, they either require large computational complexity, or rely on very crude estimation that does not preserve sufficient statistics. In this work, we develop a stochastic divergence minimization (SDM) inference algorithm for BTM to achieve better predictive likelihood in a scalable way. Experiments show that SDM-BTM trained by 30% data outperforms the best existing algorithm trained by full data.
ER -