IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] generative model(13hit)

1-13hit

Random-Based and Deep Graph Generators: Evolution and Future Prospects Open Access
Kohei WATABE

INVITED PAPER

Vol:
E107-B No:12
Page(s):
918-927
Graphs are highly flexible data structures that can model various data and relationships. By using graphs, we can abstract and represent various things in the real world. The technology of artificially generating graphs is important in various fields where graphs are applied to various fields in engineering, including communication networks, social networks, and so on. In this paper, we organize and introduce graph generation techniques from early random-based methods to the latest deep graph generators, focusing on the aspects of feature reproduction and specification. Techniques for reproducing and specifying graph features in graph generation may provide new research methods for classical graph theory and optimization problems on graphs. This paper also presents recent achievements that may lead to further exploration in these fields and discusses the future prospects of graph generation.
Multi-Task Learning of Japanese How-to Tip Machine Reading Comprehension by a Generative Model
Xiaotian WANG Tingxuan LI Takuya TAMURA Shunsuke NISHIDA Takehito UTSURO

PAPER-Natural Language Processing

Pubricized:
2023/10/23
Vol:
E107-D No:1
Page(s):
125-134
In the research of machine reading comprehension of Japanese how-to tip QA tasks, conventional extractive machine reading comprehension methods have difficulty in dealing with cases in which the answer string spans multiple locations in the context. The method of fine-tuning of the BERT model for machine reading comprehension tasks is not suitable for such cases. In this paper, we trained a generative machine reading comprehension model of Japanese how-to tip by constructing a generative dataset based on the website “wikihow” as a source of information. We then proposed two methods for multi-task learning to fine-tune the generative model. The first method is the multi-task learning with a generative and extractive hybrid training dataset, where both generative and extractive datasets are simultaneously trained on a single model. The second method is the multi-task learning with the inter-sentence semantic similarity and answer generation, where, drawing upon the answer generation task, the model additionally learns the distance between the sentences of the question/context and the answer in the training examples. The evaluation results showed that both of the multi-task learning methods significantly outperformed the single-task learning method in generative question-and-answer examples. Between the two methods for multi-task learning, that with the inter-sentence semantic similarity and answer generation performed the best in terms of the manual evaluation result. The data and the code are available at https://github.com/EternalEdenn/multitask_ext-gen_sts-gen.
Multi-Scale Correspondence Learning for Person Image Generation
Shi-Long SHEN Ai-Guo WU Yong XU

PAPER-Person Image Generation

Pubricized:
2022/04/15
Vol:
E106-D No:5
Page(s):
804-812
A generative model is presented for two types of person image generation in this paper. First, this model is applied to pose-guided person image generation, i.e., converting the pose of a source person image to the target pose while preserving the texture of that source person image. Second, this model is also used for clothing-guided person image generation, i.e., changing the clothing texture of a source person image to the desired clothing texture. The core idea of the proposed model is to establish the multi-scale correspondence, which can effectively address the misalignment introduced by transferring pose, thereby preserving richer information on appearance. Specifically, the proposed model consists of two stages: 1) It first generates the target semantic map imposed on the target pose to provide more accurate guidance during the generation process. 2) After obtaining the multi-scale feature map by the encoder, the multi-scale correspondence is established, which is useful for a fine-grained generation. Experimental results show the proposed method is superior to state-of-the-art methods in pose-guided person image generation and show its effectiveness in clothing-guided person image generation.
Enhanced Full Attention Generative Adversarial Networks
KaiXu CHEN Satoshi YAMANE

LETTER-Core Methods

Pubricized:
2023/01/12
Vol:
E106-D No:5
Page(s):
813-817
In this paper, we propose improved Generative Adversarial Networks with attention module in Generator, which can enhance the effectiveness of Generator. Furthermore, recent work has shown that Generator conditioning affects GAN performance. Leveraging this insight, we explored the effect of different normalization (spectral normalization, instance normalization) on Generator and Discriminator. Moreover, an enhanced loss function called Wasserstein Divergence distance, can alleviate the problem of difficult to train module in practice.
Few-Shot Anomaly Detection Using Deep Generative Models for Grouped Data
Kazuki SATO Satoshi NAKATA Takashi MATSUBARA Kuniaki UEHARA

LETTER-Pattern Recognition

Pubricized:
2021/10/25
Vol:
E105-D No:2
Page(s):
436-440
There exists a great demand for automatic anomaly detection in industrial world. The anomaly has been defined as a group of samples that rarely or never appears. Given a type of products, one has to collect numerous samples and train an anomaly detector. When one diverts a model trained with old types of products with sufficient inventory to the new type, one can detect anomalies of the new type before a production line is established. However, because of the definition of the anomaly, a typical anomaly detector considers the new type of products anomalous even if it is consistent with the standard. Given the above practical demand, this study propose a novel problem setting, few-shot anomaly detection, where an anomaly detector trained in source domains is adapted to a small set of target samples without full retraining. Then, we tackle this problem using a hierarchical probabilistic model based on deep learning. Our empirical results on toy and real-world datasets demonstrate that the proposed model detects anomalies in a small set of target samples successfully.
Pre-Training of DNN-Based Speech Synthesis Based on Bidirectional Conversion between Text and Speech
Kentaro SONE Toru NAKASHIKA

PAPER-Speech and Hearing

Pubricized:
2019/05/15
Vol:
E102-D No:8
Page(s):
1546-1553
Conventional approaches to statistical parametric speech synthesis use context-dependent hidden Markov models (HMMs) clustered using decision trees to generate speech parameters from linguistic features. However, decision trees are not always appropriate to model complex context dependencies of linguistic features efficiently. An alternative scheme that replaces decision trees with deep neural networks (DNNs) was presented as a possible way to overcome the difficulty. By training the network to represent high-dimensional feedforward dependencies from linguistic features to acoustic features, DNN-based speech synthesis systems convert a text into a speech. To improved the naturalness of the synthesized speech, this paper presents a novel pre-training method for DNN-based statistical parametric speech synthesis systems. In our method, a deep relational model (DRM), which represents a joint probability of two visible variables, is applied to describe the joint distribution of acoustic and linguistic features. As with DNNs, a DRM consists several hidden layers and two visible layers. Although DNNs represent feedforward dependencies from one visible variables (inputs) to other visible variables (outputs), a DRM has an ability to represent the bidirectional dependencies between two visible variables. During the maximum-likelihood (ML) -based training, the model optimizes its parameters (connection weights between two adjacent layers, and biases) of a deep architecture considering the bidirectional conversion between 1) acoustic features given linguistic features, and 2) linguistic features given acoustic features generated from itself. Owing to considering whether the generated acoustic features are recognizable, our method can obtain reasonable parameters for speech synthesis. Experimental results in a speech synthesis task show that pre-trained DNN-based systems using our proposed method outperformed randomly-initialized DNN-based systems, especially when the amount of training data is limited. Additionally, speaker-dependent speech recognition experimental results also show that our method outperformed DNN-based systems, by setting the initial parameters of our method are the same as that in the synthesis experiments.
Stock Price Prediction by Deep Neural Generative Model of News Articles
Takashi MATSUBARA Ryo AKITA Kuniaki UEHARA

PAPER-Datamining Technologies

Pubricized:
2018/01/19
Vol:
E101-D No:4
Page(s):
901-908
In this study, we propose a deep neural generative model for predicting daily stock price movements given news articles. Approaches involving conventional technical analysis have been investigated to identify certain patterns in past price movements, which in turn helps to predict future price movements. However, the financial market is highly sensitive to specific events, including corporate buyouts, product releases, and the like. Therefore, recent research has focused on modeling relationships between these events that appear in the news articles and future price movements; however, a very large number of news articles are published daily, each article containing rich information, which results in overfitting to past price movements used for parameter adjustment. Given the above, we propose a model based on a generative model of news articles that includes price movement as a condition, thereby avoiding excessive overfitting thanks to the nature of the generative model. We evaluate our proposed model using historical price movements of Nikkei 225 and Standard & Poor's 500 Stock Index, confirming that our model predicts future price movements better than such conventional classifiers as support vector machines and multilayer perceptrons. Further, our proposed model extracts significant words from news articles that are directly related to future stock price movements.
Modeling Storylines in Lyrics
Kento WATANABE Yuichiroh MATSUBAYASHI Kentaro INUI Satoru FUKAYAMA Tomoyasu NAKANO Masataka GOTO

PAPER-Natural Language Processing

Pubricized:
2017/12/22
Vol:
E101-D No:4
Page(s):
1167-1179
This paper addresses the issue of modeling the discourse nature of lyrics and presented the first study aiming at capturing the two common discourse-related notions: storylines and themes. We assume that a storyline is a chain of transitions over topics of segments and a song has at least one entire theme. We then hypothesize that transitions over topics of lyric segments can be captured by a probabilistic topic model which incorporates a distribution over transitions of latent topics and that such a distribution of topic transitions is affected by the theme of lyrics. Aiming to test those hypotheses, this study conducts experiments on the word prediction and segment order prediction tasks exploiting a large-scale corpus of popular music lyrics for both English and Japanese (around 100 thousand songs). The findings we gained from these experiments can be summarized into two respects. First, the models with topic transitions significantly outperformed the model without topic transitions in word prediction. This result indicates that typical storylines included in our lyrics datasets were effectively captured as a probabilistic distribution of transitions over latent topics of segments. Second, the model incorporating a latent theme variable on top of topic transitions outperformed the models without such variables in both word prediction and segment order prediction. From this result, we can conclude that considering the notion of theme does contribute to the modeling of storylines of lyrics.
Deep Relational Model: A Joint Probabilistic Model with a Hierarchical Structure for Bidirectional Estimation of Image and Labels
Toru NAKASHIKA

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2017/10/25
Vol:
E101-D No:2
Page(s):
428-436
Two different types of representations, such as an image and its manually-assigned corresponding labels, generally have complex and strong relationships to each other. In this paper, we represent such deep relationships between two different types of visible variables using an energy-based probabilistic model, called a deep relational model (DRM) to improve the prediction accuracies. A DRM stacks several layers from one visible layer on to another visible layer, sandwiching several hidden layers between them. As with restricted Boltzmann machines (RBMs) and deep Boltzmann machines (DBMs), all connections (weights) between two adjacent layers are undirected. During maximum likelihood (ML) -based training, the network attempts to capture the latent complex relationships between two visible variables with its deep architecture. Unlike deep neural networks (DNNs), 1) the DRM is a totally generative model and 2) allows us to generate one visible variables given the other, and 2) the parameters can be optimized in a probabilistic manner. The DRM can be also fine-tuned using DNNs, like deep belief nets (DBNs) or DBMs pre-training. This paper presents experiments conduced to evaluate the performance of a DRM in image recognition and generation tasks using the MNIST data set. In the image recognition experiments, we observed that the DRM outperformed DNNs even without fine-tuning. In the image generation experiments, we obtained much more realistic images generated from the DRM more than those from the other generative models.
Enhancing Event-Related Potentials Based on Maximum a Posteriori Estimation with a Spatial Correlation Prior
Hayato MAKI Tomoki TODA Sakriani SAKTI Graham NEUBIG Satoshi NAKAMURA

PAPER

Pubricized:
2016/04/01
Vol:
E99-D No:6
Page(s):
1437-1446
In this paper a new method for noise removal from single-trial event-related potentials recorded with a multi-channel electroencephalogram is addressed. An observed signal is separated into multiple signals with a multi-channel Wiener filter whose coefficients are estimated based on parameter estimation of a probabilistic generative model that locally models the amplitude of each separated signal in the time-frequency domain. Effectiveness of using prior information about covariance matrices to estimate model parameters and frequency dependent covariance matrices were shown through an experiment with a simulated event-related potential data set.
Robust Scene Categorization via Scale-Rotation Invariant Generative Model and Kernel Sparse Representation Classification
Jinjun KUANG Yi CHAI

LETTER-Image Recognition, Computer Vision

Vol:
E96-D No:3
Page(s):
758-761
This paper presents a novel scale-rotation invariant generative model (SRIGM) and a kernel sparse representation classification (KSRC) method for scene categorization. Recently the sparse representation classification (SRC) methods have been highly successful in a number of image processing tasks. Despite its popularity, the SRC framework lucks the abilities to handle multi-class data with high inter-class similarity or high intra-class variation. The kernel random coordinate descent (KRCD) algorithm is proposed for 1 minimization in the kernel space under the KSRC framework. It allows the proposed method to obtain satisfactory classification accuracy when inter-class similarity is high. The training samples are partitioned in multiple scales and rotated in different resolutions to create a generative model that is invariant to scale and rotation changes. This model enables the KSRC framework to overcome the high intra-class variation problem for scene categorization. The experimental results show the proposed method obtains more stable performances than other existing state-of-art scene categorization methods.
Constraining a Generative Word Alignment Model with Discriminative Output
Chooi-Ling GOH Taro WATANABE Hirofumi YAMAMOTO Eiichiro SUMITA

PAPER-Natural Language Processing

Vol:
E93-D No:7
Page(s):
1976-1983
We present a method to constrain a statistical generative word alignment model with the output from a discriminative model. The discriminative model is trained using a small set of hand-aligned data that ensures higher precision in alignment. On the other hand, the generative model improves the recall of alignment. By combining these two models, the alignment output becomes more suitable for use in developing a translation model for a phrase-based statistical machine translation (SMT) system. Our experimental results show that the joint alignment model improves the translation performance. The improvement in average of BLEU and METEOR scores is around 1.0-3.9 points.
Global and Local Feature Extraction by Natural Elastic Nets
Jiann-Ming WU Zheng-Han LIN

LETTER-Pattern Recognition

Vol:
E87-D No:9
Page(s):
2267-2271
This work explores generative models of handwritten digit images using natural elastic nets. The analysis aims to extract global features as well as distributed local features of handwritten digits. These features are expected to form a basis that is significant for discriminant analysis of handwritten digits and related analysis of character images or natural images.