1-11hit |
Atsushi NAKAMURA Masaki NAITO Hajime TSUKADA Rainer GRUHN Eiichiro SUMITA Hideki KASHIOKA Hideharu NAKAJIMA Tohru SHIMIZU Yoshinori SAGISAKA
This paper describes an application of a speech translation system to another task/domain in the real-world by using developmental data collected from real-world interactions. The total cost for this task-alteration was calculated to be 9 Person-Month. The newly applied system was also evaluated by using speech data collected from real-world interactions. For real-world speech having a machine-friendly speaking style, the newly applied system could recognize typical sentences with a word accuracy of 90% or better. We also found that, concerning the overall speech translation performance, the system could translate about 80% of the input Japanese speech into acceptable English sentences.
Naoya AZUMA Shunsuke SHIMAZAKI Noriyuki MIURA Makoto NAGATA Tomomitsu KITAMURA Satoru TAKAHASHI Motoki MURAKAMI Kazuaki HORI Atsushi NAKAMURA Kenta TSUKAMOTO Mizuki IWANAMI Eiji HANKUI Sho MUROGA Yasushi ENDO Satoshi TANAKA Masahiro YAMAGUCHI
Substrate noise coupling in RF receiver front-end circuitry for LTE wireless communication was examined by full-chip level simulation and on-chip measurements, with a demonstrator built in a 65nm CMOS technology. A CMOS digital noise emulator injects high-order harmonic noises in a silicon substrate and induces in-band spurious tones in an RF receiver on the same chip through substrate noise interference. A complete simulation flow of full-chip level substrate noise coupling uses a decoupled modeling approach, where substrate noise waveforms drawn with a unified package-chip model of noise source circuits are given to mixed-level simulation of RF chains as noise sensitive circuits. The distribution of substrate noise in a chip and the attenuation with distance are simulated and compared with the measurements. The interference of substrate noise at the 17th harmonics of 124.8MHz — the operating frequency of the CMOS noise emulator creates spurious tones in the communication bandwidth at 2.1GHz.
Tomokazu ODA Atsushi NAKAMURA Daisuke IIDA Hiroyuki OSHIDA
We propose a technique based on Brillouin optical time domain analysis for measuring loss and crosstalk in few-mode fibers (FMFs). The proposed technique extracts the loss and crosstalk of a specific mode in FMFs from the Brillouin gains and Brillouin gain coefficients measured under two different conditions in terms of the frequency difference between the pump and probe lights. The technique yields the maximum loss and crosstalk at a splice point by changing the electrical field injected into an FMF as the pump light. Experiments demonstrate that the proposed technique can measure the maximum loss and crosstalk of the LP11 mode at a splice point in a two-mode fiber.
Takanobu OBA Takaaki HORI Atsushi NAKAMURA
A dependency structure interprets modification relationships between words or phrases and is recognized as an important element in semantic information analysis. With the conventional approaches for extracting this dependency structure, it is assumed that the complete sentence is known before the analysis starts. For spontaneous speech data, however, this assumption is not necessarily correct since sentence boundaries are not marked in the data. Although sentence boundaries can be detected before dependency analysis, this cascaded implementation is not suitable for online processing since it delays the responses of the application. To solve these problems, we proposed a sequential dependency analysis (SDA) method for online spontaneous speech processing, which enabled us to analyze incomplete sentences sequentially and detect sentence boundaries simultaneously. In this paper, we propose an improved SDA integrating a labeling-based sentence boundary detection (SntBD) technique based on Conditional Random Fields (CRFs). In the new method, we use CRF for soft decision of sentence boundaries and combine it with SDA to retain its online framework. Since CRF-based SntBD yields better estimates of sentence boundaries, SDA can provide better results in which the dependency structure and sentence boundaries are consistent. Experimental results using spontaneous lecture speech from the Corpus of Spontaneous Japanese show that our improved SDA outperforms the original SDA with SntBD accuracy providing better dependency analysis results.
Takanori SUZUKI Hideo ARIMOTO Takeshi KITATANI Aki TAKEI Takafumi TANIGUCHI Kazunori SHINODA Shigehisa TANAKA Shinji TSUJI Tatemi IDO Jun IGRASHI Atsushi NAKAMURA Kazuhiko NAOE Kenji UCHIDA
A dual-core spot size converter (DC-SSC) is integrated with a lateral grating assisted lateral co-directional coupler (LGLC) tunable laser by using no additional complicated fabrication processes. The excess loss due to the DC-SSC is only 0.5 dB, and narrow full width half maximums (FWHMs) of vertical and horizontal far-field patterns (FFPs) produced by the laser are about 25° and 20°. This integration causes no degradations of the performance of the LGLC laser; in other words, it maintains good lasing characteristics, namely, wide tuning range of over 68 nm and SMSR of over 35 dB in the C-band under a 50 semi-cooled condition.
Shinji WATANABE Atsushi NAKAMURA
We introduce a robust classification method based on the Bayesian predictive distribution (Bayesian Predictive Classification, referred to as BPC) for speech recognition. We and others have recently proposed a total Bayesian framework named Variational Bayesian Estimation and Clustering for speech recognition (VBEC). VBEC includes the practical computation of approximate posterior distributions that are essential for BPC, based on variational Bayes (VB). BPC using VB posterior distributions (VB-BPC) provides an analytical solution for the predictive distribution as the Student's t-distribution, which can mitigate the over-training effects by marginalizing the model parameters of an output distribution. We address the sparse data problem in speech recognition, and show experimentally that VB-BPC is robust against data sparseness.
Erik MCDERMOTT Atsushi NAKAMURA
Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation thereof; state-to-state evolution is modeled only crudely, with no explicit relationship between states, such as would be afforded by the use of phonetic features commonly used by linguists to describe speech phenomena, or by the continuity and smoothness of the production parameters governing speech. This survey article attempts to provide an overview of proposals by several researchers for improving acoustic modeling in these regards. Such topics as the controversial Motor Theory of Speech Perception, work by Hogden explicitly using a continuity constraint in a pseudo-articulatory domain, the Kalman filter based Hidden Dynamic Model, and work by many groups showing the benefits of using articulatory features instead of phones as the underlying units of speech, will be covered.
Shinji WATANABE Yasuhiro MINAMI Atsushi NAKAMURA Naonori UEDA
A Shared-State Hidden Markov Model (SS-HMM) has been widely used as an acoustic model in speech recognition. In this paper, we propose a method for constructing SS-HMMs within a practical Bayesian framework. Our method derives the Bayesian model selection criterion for the SS-HMM based on the variational Bayesian approach. The appropriate phonetic decision tree structure of the SS-HMM is found by using the Bayesian criterion. Unlike the conventional asymptotic criteria, this criterion is applicable even in the case of an insufficient amount of training data. The experimental results on isolated word recognition demonstrate that the proposed method does not require the tuning parameter that must be tuned according to the amount of training data, and is useful for selecting the appropriate SS-HMM structure for practical use.
Atsunori OGAWA Satoshi TAKAHASHI Atsushi NAKAMURA
This paper proposes an efficient combination of state likelihood recycling and batch state likelihood calculation for accelerating acoustic likelihood calculation in an HMM-based speech recognizer. Recycling and batch calculation are each based on different technical approaches, i.e. the former is a purely algorithmic technique while the latter fully exploits computer architecture. To accelerate the recognition process further by combining them efficiently, we introduce conditional fast processing and acoustic backing-off. Conditional fast processing is based on two criteria. The first potential activity criterion is used to control not only the recycling of state likelihoods at the current frame but also the precalculation of state likelihoods for several succeeding frames. The second reliability criterion and acoustic backing-off are used to control the choice of recycled or batch calculated state likelihoods when they are contradictory in the combination and to prevent word accuracies from degrading. Large vocabulary spontaneous speech recognition experiments using four different CPU machines under two environmental conditions showed that, compared with the baseline recognizer, recycling and batch calculation, our combined acceleration technique further reduced both of the acoustic likelihood calculation time and the total recognition time. We also performed detailed analyses to reveal each technique's acceleration and environmental dependency mechanisms by classifying types of state likelihoods and counting each of them. The analysis results comfirmed the effectiveness of the combined acceleration technique.
Takanori UNO Kouji ICHIKAWA Yuichi MABUCHI Atsushi NAKAMURA Yuji OKAZAKI Hideki ASAI
In this paper, we studied the use of common-mode noise reduction technique for in-vehicle electronic equipment in an actual instrument design. We have improved the circuit model of the common-mode noise that flows to the wire harness to add the effect of a bypass capacitor located near the LSI. We analyzed the improved circuit model using a circuit simulator and verified the effectiveness of the noise reduction condition derived from the circuit model. It was also confirmed that offsetting the impedance mismatch in the PCB section requires to make a circuit constant larger than that necessary for doing the impedance mismatch in the LSI section. An evaluation circuit board comprising an automotive microcomputer was prototyped to experiment on the common-mode noise reduction effect of the board. The experimental results showed the noise reduction effect of the board. The experimental results also revealed that the degree of impedance mismatch in the LSI section can be estimated by using a PCB having a known impedance. We further inquired into the optimization of impedance parameters, which is difficult for actual products at present. To satisfy the noise reduction condition composed of numerous parameters, we proposed a design method using an optimization algorithm and an electromagnetic field simulator, and confirmed its effectiveness.
Takanobu OBA Takaaki HORI Atsushi NAKAMURA Akinori ITO
This paper describes a technique for overcoming the model shrinkage problem in automatic speech recognition (ASR), which allows application developers and users to control the model size with less degradation of accuracy. Recently, models for ASR systems tend to be large and this can constitute a bottleneck for developers and users without special knowledge of ASR with respect to introducing the ASR function. Specifically, discriminative language models (DLMs) are usually designed in a high-dimensional parameter space, although DLMs have gained increasing attention as an approach for improving recognition accuracy. Our proposed method can be applied to linear models including DLMs, in which the score of an input sample is given by the inner product of its features and the model parameters, but our proposed method can shrink models in an easy computation by obtaining simple statistics, which are square sums of feature values appearing in a data set. Our experimental results show that our proposed method can shrink a DLM with little degradation in accuracy and perform properly whether or not the data for obtaining the statistics are the same as the data for training the model.