1-2hit |
Hiroki NOGUCHI Kazuo MIURA Tsuyoshi FUJINAGA Takanobu SUGAHARA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.
Kosuke MIZUNO Hiroki NOGUCHI Guangji HE Yosuke TERACHI Tetsuya KAMINO Tsuyoshi FUJINAGA Shintaro IZUMI Yasuo ARIKI Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
This paper describes a SIFT (Scale Invariant Feature Transform) descriptor generation engine which features a VLSI oriented SIFT algorithm, three-stage pipelined architecture and novel systolic array architectures for Gaussian filtering and key-point extraction. The ROI-based scheme has been employed for the VLSI oriented algorithm. The novel systolic array architecture drastically reduces the number of operation cycle and memory access. The cycle counts of Gaussian filtering module is reduced by 82%, compared with the SIMD architecture. The number of memory accesses of the Gaussian filtering module and the key-point extraction module are reduced by 99.8% and 66% respectively, compared with the results obtained assuming the SIMD architecture. The proposed schemes provide processing capability for HDTV resolution video (1920 1080 pixels) at 30 frames per second (fps). The test chip has been fabricated in 65 nm CMOS technology and occupies 4.2 4.2 mm2 containing 1.1 M gates and 1.38 Mbit on-chip memory. The measured data demonstrates 38.2 mW power consumption at 78 MHz and 1.2 V.