Keyword Search Result

[Keyword] semi-supervised(32hit)

21-32hit(32hit)

  • Sentiment Classification in Under-Resourced Languages Using Graph-Based Semi-Supervised Learning Methods Open Access

    Yong REN  Nobuhiro KAJI  Naoki YOSHINAGA  Masaru KITSUREGAWA  

     
    PAPER

      Vol:
    E97-D No:4
      Page(s):
    790-797

    In sentiment classification, conventional supervised approaches heavily rely on a large amount of linguistic resources, which are costly to obtain for under-resourced languages. To overcome this scarce resource problem, there exist several methods that exploit graph-based semi-supervised learning (SSL). However, fundamental issues such as controlling label propagation, choosing the initial seeds, selecting edges have barely been studied. Our evaluation on three real datasets demonstrates that manipulating the label propagating behavior and choosing labeled seeds appropriately play a critical role in adopting graph-based SSL approaches for this task.

  • Semi-Supervised Nonparametric Discriminant Analysis

    Xianglei XING  Sidan DU  Hua JIANG  

     
    LETTER-Pattern Recognition

      Vol:
    E96-D No:2
      Page(s):
    375-378

    We extend the Nonparametric Discriminant Analysis (NDA) algorithm to a semi-supervised dimensionality reduction technique, called Semi-supervised Nonparametric Discriminant Analysis (SNDA). SNDA preserves the inherent advantages of NDA, that is, relaxing the Gaussian assumption required for the traditional LDA-based methods. SNDA takes advantage of both the discriminating power provided by the NDA method and the locality-preserving power provided by the manifold learning. Specifically, the labeled data points are used to maximize the separability between different classes and both the labeled and unlabeled data points are used to build a graph incorporating neighborhood information of the data set. Experiments on synthetic as well as real datasets demonstrate the effectiveness of the proposed approach.

  • Risk-Based Semi-Supervised Discriminative Language Modeling for Broadcast Transcription

    Akio KOBAYASHI  Takahiro OKU  Toru IMAI  Seiichi NAKAGAWA  

     
    PAPER-Speech and Hearing

      Vol:
    E95-D No:11
      Page(s):
    2674-2681

    This paper describes a new method for semi-supervised discriminative language modeling, which is designed to improve the robustness of a discriminative language model (LM) obtained from manually transcribed (labeled) data. The discriminative LM is implemented as a log-linear model, which employs a set of linguistic features derived from word or phoneme sequences. The proposed semi-supervised discriminative modeling is formulated as a multi-objective optimization programming problem (MOP), which consists of two objective functions defined on both labeled lattices and automatic speech recognition (ASR) lattices as unlabeled data. The objectives are coherently designed based on the expected risks that reflect information about word errors for the training data. The model is trained in a discriminative manner and acquired as a solution to the MOP problem. In transcribing Japanese broadcast programs, the proposed method reduced relatively a word error rate by 6.3% compared with that achieved by a conventional trigram LM.

  • Early Stopping Heuristics in Pool-Based Incremental Active Learning for Least-Squares Probabilistic Classifier

    Tsubasa KOBAYASHI  Masashi SUGIYAMA  

     
    PAPER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:8
      Page(s):
    2065-2073

    The objective of pool-based incremental active learning is to choose a sample to label from a pool of unlabeled samples in an incremental manner so that the generalization error is minimized. In this scenario, the generalization error often hits a minimum in the middle of the incremental active learning procedure and then it starts to increase. In this paper, we address the problem of early labeling stopping in probabilistic classification for minimizing the generalization error and the labeling cost. Among several possible strategies, we propose to stop labeling when the empirical class-posterior approximation error is maximized. Experiments on benchmark datasets demonstrate the usefulness of the proposed strategy.

  • Learning to Generate a Table-of-Contents with Supportive Knowledge

    Viet Cuong NGUYEN  Le Minh NGUYEN  Akira SHIMAZU  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    423-431

    In the text summarization field, a table-of-contents is a type of indicative summary that is especially suited for locating information in a long document, or a set of documents. It is also a useful summary for a reader to quickly get an overview of the entire contents. The current models for generating a table-of-contents produced relatively low quality output with many meaningless titles, or titles that have no overlapping meaning with the corresponding contents. This problem may be due to the lack of semantic information and topic information in those models. In this research, we propose to integrate supportive knowledge into the learning models to improve the quality of titles in a generated table-of-contents. The supportive knowledge is derived from a hierarchical clustering of words, which is built from a large collection of raw text, and a topic model, which is directly estimated from the training data. The relatively good results of the experiments showed that the semantic and topic information supplied by supportive knowledge have good effects on title generation, and therefore, they help to improve the quality of the generated table-of-contents.

  • Laplacian Support Vector Machines with Multi-Kernel Learning

    Lihua GUO  Lianwen JIN  

     
    LETTER-Pattern Recognition

      Vol:
    E94-D No:2
      Page(s):
    379-383

    The Laplacian support vector machine (LSVM) is a semi-supervised framework that uses manifold regularization for learning from labeled and unlabeled data. However, the optimal kernel parameters of LSVM are difficult to obtain. In this paper, we propose a multi-kernel LSVM (MK-LSVM) method using multi-kernel learning formulations in combination with the LSVM. Our learning formulations assume that a set of base kernels are grouped, and employ l2 norm regularization for automatically seeking the optimal linear combination of base kernels. Experimental testing reveals that our method achieves better performance than the LSVM alone using synthetic data, the UCI Machine Learning Repository, and the Caltech database of Generic Object Classification.

  • A Semi-Supervised Approach to Perceived Age Prediction from Face Images

    Kazuya UEKI  Masashi SUGIYAMA  Yasuyuki IHARA  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E93-D No:10
      Page(s):
    2875-2878

    We address the problem of perceived age estimation from face images, and propose a new semi-supervised approach involving two novel aspects. The first novelty is an efficient active learning strategy for reducing the cost of labeling face samples. Given a large number of unlabeled face samples, we reveal the cluster structure of the data and propose to label cluster-representative samples for covering as many clusters as possible. This simple sampling strategy allows us to boost the performance of a manifold-based semi-supervised learning method only with a relatively small number of labeled samples. The second contribution is to take the heterogeneous characteristics of human age perception into account. It is rare to misjudge the age of a 5-year-old child as 15 years old, but the age of a 35-year-old person is often misjudged as 45 years old. Thus, magnitude of the error is different depending on subjects' age. We carried out a large-scale questionnaire survey for quantifying human age perception characteristics, and propose to utilize the quantified characteristics in the framework of weighted regression. Consequently, our proposed method is expressed in the form of weighted least-squares with a manifold regularizer, which is scalable to massive datasets. Through real-world age estimation experiments, we demonstrate the usefulness of the proposed method.

  • On Computational Issues of Semi-Supervised Local Fisher Discriminant Analysis

    Masashi SUGIYAMA  

     
    LETTER-Artificial Intelligence and Cognitive Science

      Vol:
    E92-D No:5
      Page(s):
    1204-1208

    Dimensionality reduction is one of the important preprocessing steps in practical pattern recognition. SEmi-supervised Local Fisher discriminant analysis (SELF)--which is a semi-supervised and local extension of Fisher discriminant analysis--was shown to work excellently in experiments. However, when data dimensionality is very high, a naive use of SELF is prohibitive due to high computational costs and large memory requirement. In this paper, we introduce computational tricks for making SELF applicable to large-scale problems.

  • Semi-Supervised Classification with Spectral Projection of Multiplicatively Modulated Similarity Data

    Weiwei DU  Kiichi URAHAMA  

     
    LETTER-Pattern Recognition

      Vol:
    E90-D No:9
      Page(s):
    1456-1459

    A simple and efficient semi-supervised classification method is presented. An unsupervised spectral mapping method is extended to a semi-supervised situation with multiplicative modulation of similarities between data. Our proposed algorithm is derived by linearization of this nonlinear semi-supervised mapping method. Experiments using the proposed method for some public benchmark data and color image data reveal that our method outperforms a supervised algorithm using the linear discriminant analysis and a previous semi-supervised classification method.

  • Semi-Supervised Classification with Spectral Subspace Projection of Data

    Weiwei DU  Kiichi URAHAMA  

     
    LETTER-Pattern Recognition

      Vol:
    E90-D No:1
      Page(s):
    374-377

    A semi-supervised classification method is presented. A robust unsupervised spectral mapping method is extended to a semi-supervised situation. Our proposed algorithm is derived by linearization of this nonlinear semi-supervised mapping method. Experiments using the proposed method for some public benchmark data reveal that our method outperforms a supervised algorithm using the linear discriminant analysis for the iris and wine data and is also more accurate than a semi-supervised algorithm of the logistic GRF for the ionosphere dataset.

  • Unsupervised and Semi-Supervised Extraction of Clusters from Hypergraphs

    Weiwei DU  Kohei INOUE  Kiichi URAHAMA  

     
    LETTER-Biological Engineering

      Vol:
    E89-D No:7
      Page(s):
    2315-2318

    We extend a graph spectral method for extracting clusters from graphs representing pairwise similarity between data to hypergraph data with hyperedges denoting higher order similarity between data. Our method is robust to noisy outlier data and the number of clusters can be easily determined. The unsupervised method extracts clusters sequentially in the order of the majority of clusters. We derive from the unsupervised algorithm a semi-supervised one which can extract any cluster irrespective of its majority. The performance of those methods is exemplified with synthetic toy data and real image data.

  • User Feedback-Driven Document Clustering Technique for Information Organization

    Han-joon KIM  Sang-goo LEE  

     
    LETTER-Databases

      Vol:
    E85-D No:6
      Page(s):
    1043-1048

    This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes documents into groups based only on similarity measures. In this paper, we attempt to isolate more semantically coherent clusters by employing the domain-specific knowledge provided by a document analyst. By using external human knowledge to guide the clustering mechanism with some flexibility when creating the clusters, clustering efficiency can be considerably enhanced. Experimental results show that the use of only a little external knowledge can considerably enhance the quality of clustering results that satisfy users' constraint.

21-32hit(32hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.