Keyword Search Result

[Keyword] speaker verification(19hit)

1-19hit
  • Blind Bandwidth Extension with a Non-Linear Function and Its Evaluation on Automatic Speaker Verification

    Ryota KAMINISHI  Haruna MIYAMOTO  Sayaka SHIOTA  Hitoshi KIYA  

     
    PAPER

      Pubricized:
    2019/10/25
      Vol:
    E103-D No:1
      Page(s):
    42-49

    This study evaluates the effects of some non-learning blind bandwidth extension (BWE) methods on state-of-the-art automatic speaker verification (ASV) systems. Recently, a non-linear bandwidth extension (N-BWE) method has been proposed as a blind, non-learning, and light-weight BWE approach. Other non-learning BWEs have also been developed in recent years. For ASV evaluations, most data available to train ASV systems is narrowband (NB) telephone speech. Meanwhile, wideband (WB) data have been used to train the state-of-the-art ASV systems, such as i-vector, d-vector, and x-vector. This can cause sampling rate mismatches when all datasets are used. In this paper, we investigate the influence of sampling rate mismatches in the x-vector-based ASV systems and how non-learning BWE methods perform against them. The results showed that the N-BWE method improved the equal error rate (EER) on ASV systems based on the x-vector when the mismatches were present. We researched the relationship between objective measurements and EERs. Consequently, the N-BWE method produced the lowest EERs on both ASV systems and obtained the lower RMS-LSD value and the higher STOI score.

  • Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings

    Shengyu YAO  Ruohua ZHOU  Pengyuan ZHANG  

     
    PAPER-Speech and Hearing

      Pubricized:
    2018/11/19
      Vol:
    E102-D No:2
      Page(s):
    346-354

    This paper proposes a speaker-phonetic i-vector modeling method for text-dependent speaker verification with random digit strings, in which enrollment and test utterances are not of the same phrase. The core of the proposed method is making use of digit alignment information in i-vector framework. By utilizing force alignment information, verification scores of the testing trials can be computed in the fixed-phrase situation, in which the compared speech segments between the enrollment and test utterances are of the same phonetic content. Specifically, utterances are segmented into digits, then a unique phonetically-constrained i-vector extractor is applied to obtain speaker and channel variability representation for every digit segment. Probabilistic linear discriminant analysis (PLDA) and s-norm are subsequently used for channel compensation and score normalization respectively. The final score is obtained by combing the digit scores, which are computed by scoring individual digit segments of the test utterance against the corresponding ones of the enrollment. Experimental results on the Part 3 of Robust Speaker Recognition (RSR2015) database demonstrate that the proposed approach significantly outperforms GMM-UBM by 52.3% and 53.5% relative in equal error rate (EER) for male and female respectively.

  • Lexicon-Based Local Representation for Text-Dependent Speaker Verification

    Hanxu YOU  Wei LI  Lianqiang LI  Jie ZHU  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/12/05
      Vol:
    E100-D No:3
      Page(s):
    587-589

    A text-dependent i-vector extraction scheme and a lexicon-based binary vector (L-vector) representation are proposed to improve the performance of text-dependent speaker verification. I-vector and L-vector are used to represent the utterances for enrollment and test. An improved cosine distance kernel is constructed by combining i-vector and L-vector together and is used to distinguish both speaker identity and lexical (or text) diversity with back-end support vector machine (SVM). Experiments are conducted on RSR 2015 Corpus part 1 and part 2, the results indicate that at most 30% improvement can be obtained compared with traditional i-vector baseline.

  • Deep Nonlinear Metric Learning for Speaker Verification in the I-Vector Space

    Yong FENG  Qingyu XIONG  Weiren SHI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2016/10/04
      Vol:
    E100-D No:1
      Page(s):
    215-219

    Speaker verification is the task of determining whether two utterances represent the same person. After representing the utterances in the i-vector space, the crucial problem is only how to compute the similarity of two i-vectors. Metric learning has provided a viable solution to this problem. Until now, many metric learning algorithms have been proposed, but they are usually limited to learning a linear transformation. In this paper, we propose a nonlinear metric learning method, which learns an explicit mapping from the original space to an optimal subspace using deep Restricted Boltzmann Machine network. The proposed method is evaluated on the NIST SRE 2008 dataset. Since the proposed method has a deep learning architecture, the evaluation results show superior performance than some state-of-the-art methods.

  • A Time-Varying Adaptive IIR Filter for Robust Text-Independent Speaker Verification

    Santi NURATCH  Panuthat BOONPRAMUK  Chai WUTIWIWATCHAI  

     
    PAPER-Speech and Hearing

      Vol:
    E96-D No:3
      Page(s):
    699-707

    This paper presents a new technique to smooth speech feature vectors for text-independent speaker verification using an adaptive band-pass IIR filer. The filter is designed by considering the probability density of modulation-frequency components of an M-dimensional feature vector. Each dimension of the feature vector is processed and filtered separately. Initial filter parameters, low-cut-off and high-cut-off frequencies, are first determined by the global mean of the probability densities computed from all feature vectors of a given speech utterance. Then, the cut-off frequencies are adapted over time, i.e. every frame vector, in both low-frequency and high-frequency bands based also on the global mean and the standard deviation of feature vectors. The filtered feature vectors are used in a SVM-GMM Supervector speaker verification system. The NIST Speaker Recognition Evaluation 2006 (SRE06) core-test is used in evaluation. Experimental results show that the proposed technique clearly outperforms a baseline system using a conventional RelAtive SpecTrA (RASTA) filter.

  • Factor Analysis of Neighborhood-Preserving Embedding for Speaker Verification

    Chunyan LIANG  Lin YANG  Qingwei ZHAO  Yonghong YAN  

     
    LETTER-Speech and Hearing

      Vol:
    E95-D No:10
      Page(s):
    2572-2576

    In this letter, we adopt a new factor analysis of neighborhood-preserving embedding (NPE) for speaker verification. NPE aims at preserving the local neighborhood structure on the data and defines a low-dimensional speaker space called neighborhood-preserving embedding space. We compare the proposed method with the state-of-the-art total variability approach on the telephone-telephone core condition of the NIST 2008 Speaker Recognition Evaluation (SRE) dataset. The experimental results indicate that the proposed NPE method outperforms the total variability approach, providing up to 24% relative improvement.

  • Artificial Cohort Generation Based on Statistics of Real Cohorts for GMM-Based Speaker Verification

    Yuuji MUKAI  Hideki NODA  Takashi OSANAI  

     
    LETTER-Speech and Hearing

      Vol:
    E94-D No:1
      Page(s):
    162-166

    This paper discusses speaker verification (SV) using Gaussian mixture models (GMMs), where only utterances of enrolled speakers are required. Such an SV system can be realized using artificially generated cohorts instead of real cohorts from speaker databases. This paper presents a rational approach to set GMM parameters for artificial cohorts based on statistics of GMM parameters for real cohorts. Equal error rates for the proposed method are about 10% less than those for the previous method, where GMM parameters for artificial cohorts were set in an ad hoc manner.

  • Approximate Decision Function and Optimization for GMM-UBM Based Speaker Verification

    Xiang XIAO  Xiang ZHANG  Haipeng WANG  Hongbin SUO  Qingwei ZHAO  Yonghong YAN  

     
    LETTER-Speech and Hearing

      Vol:
    E92-D No:9
      Page(s):
    1798-1802

    The GMM-UBM framework has been proved to be one of the most effective approaches to the automatic speaker verification (ASV) task in recent years. In this letter, we first propose an approximate decision function of traditional GMM-UBM, from which it is shown that the contribution to classification of each Gaussian component is equally important. However, research in speaker perception shows that a different speech sound unit defined by Gaussian component makes a different contribution to speaker verification. This motivates us to emphasize some sound units which have discriminability between speakers while de-emphasize the speech sound units which contain little information for speaker verification. Experiments on 2006 NIST SRE core task show that the proposed approach outperforms traditional GMM-UBM approach in classification accuracy.

  • Text-Independent Speaker Verification Using Artificially Generated GMMs for Cohorts

    Yuuji MUKAI  Hideki NODA  Michiharu NIIMI  Takashi OSANAI  

     
    LETTER-Speech and Hearing

      Vol:
    E91-D No:10
      Page(s):
    2536-2539

    This paper presents a text-independent speaker verification method using Gaussian mixture models (GMMs), where only utterances of enrolled speakers are required. Artificial cohorts are used instead of those from speaker databases, and GMMs for artificial cohorts are generated by changing model parameters of the GMM for a claimed speaker. Equal error rates by the proposed method are about 60% less than those by a conventional method which also uses only utterances of enrolled speakers.

  • Evaluation of a Noise-Robust Multi-Stream Speaker Verification Method Using F0 Information

    Taichi ASAMI  Koji IWANO  Sadaoki FURUI  

     
    PAPER-Speaker Verification

      Vol:
    E91-D No:3
      Page(s):
    549-557

    We have previously proposed a noise-robust speaker verification method using fundamental frequency (F0) extracted using the Hough transform. The method also incorporates an automatic stream-weight and decision threshold estimation technique. It has been confirmed that the proposed method is effective for white noise at various SNR conditions. This paper evaluates the proposed method in more practical in-car and elevator-hall noise conditions. The paper first describes the noise-robust F0 extraction method and details of our robust speaker verification method using multi-stream HMMs for integrating the extracted F0 and cepstral features. Details of the automatic stream-weight and threshold estimation method for multi-stream speaker verification framework are also explained. This method simultaneously optimizes stream-weights and a decision threshold by combining the linear discriminant analysis (LDA) and the Adaboost technique. Experiments were conducted using Japanese connected digit speech contaminated by white, in-car, or elevator-hall noise at various SNRs. Experimental results show that the F0 features improve the verification performance in various noisy environments, and that our stream-weight and threshold optimization method effectively estimates control parameters so that FARs and FRRs are adjusted to achieve equal error rates (EERs) under various noisy conditions.

  • Speaker Verification in Realistic Noisy Environment in Forensic Science

    Toshiaki KAMADA  Nobuaki MINEMATSU  Takashi OSANAI  Hisanori MAKINAE  Masumi TANIMOTO  

     
    PAPER-Speaker Verification

      Vol:
    E91-D No:3
      Page(s):
    558-566

    In forensic voice telephony speaker verification, we may be requested to identify a speaker in a very noisy environment, unlike the conditions in general research. In a noisy environment, we process speech first by clarifying it. However, the previous study of speaker verification from clarified speech did not yield satisfactory results. In this study, we experimented on speaker verification with clarification of speech in a noisy environment, and we examined the relationship between improving acoustic quality and speaker verification results. Moreover, experiments with realistic noise such as a crime prevention alarm and power supply noise was conducted, and speaker verification accuracy in a realistic environment was examined. We confirmed the validity of speaker verification with clarification of speech in a realistic noisy environment.

  • Codebook-Based Pseudo-Impostor Data Generation and Template Compression for Text-Dependent Speaker Verification

    Jian LUAN  Jie HAO  Tomonari KAKINO  Akinori KAWAMURA  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:9
      Page(s):
    1414-1421

    DTW-based text-dependent speaker verification technology is an effective scheme for protecting personal information in personal electronic products from others. To enhance the performance of a DTW-based system, an impostor database covering all possible passwords is generally required for the matching scores normalization. However, it becomes impossible in our practical application scenario since users are not restricted in their choice of password. We propose a method to generate pseudo-impostor data by employing an acoustic codebook. Based on the pseudo-impostor data, two normalization algorithms are developed. Besides, a template compression approach based on the codebook is introduced. Some modifications to the conventional DTW global constraints are also made for the compressed template. Combining the normalization and template compression methods, we obtain more than 66% and 35% relative reduction in storage and EER, respectively. We expect that other DTW-based tasks may also benefit from our methods.

  • Assessment of On-Line Model Quality and Threshold Estimation in Speaker Verification

    Javier R. SAETA  Javier HERNANDO  

     
    PAPER-Speech and Hearing

      Vol:
    E90-D No:4
      Page(s):
    759-765

    The selection of the most representative utterances coming from a speaker is essential for the right performance of automatic enrollment in speaker verification. Model quality measures and threshold estimation methods mainly deal with the scarcity of data and the difficulty of obtaining data from impostors in real applications. Conventional methods estimate the quality of the training utterances once the model is created. In such case, it is not possible to ask the user for more utterances during the training session if necessary. A new training session must be started. That was especially unusable in applications where only one or two enrolment sessions were allowed. In this paper, a new on-line quality method based on a male and a female Universal Background Model (UBM) is introduced. The two models act as a reference for new utterances and show if they belong to the same speaker and provide a measure of its quality at the same time. On the other hand, the estimation of the verification threshold is also strongly influenced by the previous selection of the speaker's utterances. In this context, potential outliers, i.e., those client scores which are distant with regard to mean, could lead to wrong mean and variance client estimations. To alleviate this problem, some efficient threshold estimation methods based on removing or weighting scores are proposed here. Before estimating the threshold, the client scores catalogued as outliers are removed, pruned or weighted, improving subsequent estimations. Text-dependent experiments have been carried out by using a telephonic multi-session database in Spanish. The database has been recorded by the authors and has 184 speakers.

  • Effects of Phoneme Type and Frequency on Distributed Speaker Identification and Verification

    Mohamed Abdel FATTAH  Fuji REN  Shingo KUROIWA  

     
    PAPER-Speech and Hearing

      Vol:
    E89-D No:5
      Page(s):
    1712-1719

    In the European Telecommunication Standards Institute (ETSI), Distributed Speech Recognition (DSR) front-end, the distortion added due to feature compression on the front end side increases the variance flooring effect, which in turn increases the identification error rate. The penalty incurred in reducing the bit rate is the degradation in speaker recognition performance. In this paper, we present a nontraditional solution for the previously mentioned problem. To reduce the bit rate, a speech signal is segmented at the client, and the most effective phonemes (determined according to their type and frequency) for speaker recognition are selected and sent to the server. Speaker recognition occurs at the server. Applying this approach to YOHO corpus, we achieved an identification error rate (ER) of 0.05% using an average segment of 20.4% for a testing utterance in a speaker identification task. We also achieved an equal error rate (EER) of 0.42% using an average segment of 15.1% for a testing utterance in a speaker verification task.

  • Improved Jacobian Adaptation for Robust Speaker Verification

    Jan ANGUITA  Javier HERNANDO  Alberto ABAD  

     
    LETTER-Speech and Hearing

      Vol:
    E88-D No:7
      Page(s):
    1767-1770

    Jacobian Adaptation (JA) has been successfully used in Automatic Speech Recognition (ASR) systems to adapt the acoustic models from the training to the testing noise conditions. In this work we present an improvement of JA for speaker verification, where a specific training noise reference is estimated for each speaker model. The new proposal, which will be referred to as Model-dependent Noise Reference Jacobian Adaptation (MNRJA), has consistently outperformed JA in our speaker verification experiments.

  • Discrimination Method of Synthetic Speech Using Pitch Frequency against Synthetic Speech Falsification

    Akio OGIHARA  Hitoshi UNNO  Akira SHIOZAKI  

     
    PAPER-Biometrics

      Vol:
    E88-A No:1
      Page(s):
    280-286

    We propose discrimination method of synthetic speech using pitch pattern of speech signal. By applying the proposed synthetic speech discrimination system as pre-process before the conventional HMM speaker verification system, we can improve the safety of conventional speaker verification system against imposture using synthetic speech. The proposed method distinguishes between synthetic speech and natural speech according to the pitch pattern which is distribution of value of normalized short-range autocorrelation function. We performed the experiment of user verification, and confirmed the validity of the proposed method.

  • Phoneme-Balanced and Digit-Sequence-Preserving Connected Digit Patterns for Text-Prompted Speaker Verification

    Tsuneo KATO  Tohru SHIMIZU  

     
    PAPER

      Vol:
    E87-D No:5
      Page(s):
    1194-1199

    This paper presents a novel design of connected digit patterns to achieve high accuracy text-prompted speaker verification over a cellular phone network. To reduce the error rate, a phoneme-balanced connected digit pattern for enrollment, and digit-sequence-preserving connected digit patterns for verification (i.e. patterns preserving partial digit sequences of the enrollment pattern) are proposed. In addition to these, a decision procedure using multiple patterns has been designed to overcome the low quality of cellular phone speech. Experimental results on cellular phone speech showed the phoneme-balanced patterns for enrollment and digit-sequence-preserving patterns for verification reduced more than 50% of equal error rate compared to the conventional method using randomly-selected and randomly-reordered digit patterns. The decision procedure reduced 60% of the error rate. In addition, this paper shows that verification patterns depending on the pattern of a preceding utterance reduced 10% of the error rate. Overall, the error rate obtained by the proposed method was 1% for 99% of clients and 95% of impostors.

  • Robust Model for Speaker Verification against Session-Dependent Utterance Variation

    Tomoko MATSUI  Kiyoaki AIKAWA  

     
    PAPER-Speech and Hearing

      Vol:
    E86-D No:4
      Page(s):
    712-718

    This paper investigates a new method for creating robust speaker models to cope with inter-session variation of a speaker in a continuous HMM-based speaker verification system. The new method estimates session-independent parameters by decomposing inter-session variations into two distinct parts: session-dependent and -independent. The parameters of the speaker models are estimated using the speaker adaptive training algorithm in conjunction with the equalization of session-dependent variation. The resultant models capture the session-independent speaker characteristics more reliably than the conventional models and their discriminative power improves accordingly. Moreover we have made our models more invariant to handset variations in a public switched telephone network (PSTN) by focusing on session-dependent variation and handset-dependent distortion separately. Text-independent speech data recorded by 20 speakers in seven sessions over 16 months was used to evaluate the new approach. The proposed method reduces the error rate by 15% relatively. When compared with the popular cepstral mean normalization, the error rate is reduced by 24% relatively when the speaker models were recreated using speech data recorded in four or more sessions.

  • A Context-Dependent Sequential Decision for Speaker Verification

    Hideki NODA  Katsuya HARADA  Eiji KAWAGUCHI  

     
    LETTER-Speech Processing and Acoustics

      Vol:
    E82-D No:10
      Page(s):
    1433-1436

    This paper presents an improved method of speaker verification using the sequential probability ratio test (SPRT), which can treat the correlation between successive feature vectors. The hidden Markov model with the mean field approximation enables us to consider the correlation in the SPRT, i. e. , using the mean field of previous state, probability computation can be carried out as if input samples were independent each other.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.