Masayuki ODAGAWA Tetsushi KOIDE Toru TAMAKI Shigeto YOSHIDA Hiroshi MIENO Shinji TANAKA
This paper presents examination result of possibility for automatic unclear region detection in the CAD system for colorectal tumor with real time endoscopic video image. We confirmed that it is possible to realize the CAD system with navigation function of clear region which consists of unclear region detection by YOLO2 and classification by AlexNet and SVMs on customizable embedded DSP cores. Moreover, we confirmed the real time CAD system can be constructed by a low power ASIC using customizable embedded DSP cores.
Chung-Chien HSU Kah-Meng CHEONG Tai-Shih CHI Yu TSAO
This paper proposes a voice activity detection (VAD) algorithm based on an energy related feature of the frequency modulation of harmonics. A multi-resolution spectro-temporal analysis framework, which was developed to extract texture features of the audio signal from its Fourier spectrogram, is used to extract frequency modulation features of the speech signal. The proposed algorithm labels the voice active segments of the speech signal by comparing the energy related feature of the frequency modulation of harmonics with a threshold. Then, the proposed VAD is implemented on one of Texas Instruments (TI) digital signal processor (DSP) platforms for real-time operation. Simulations conducted on the DSP platform demonstrate the proposed VAD performs significantly better than three standard VADs, ITU-T G.729B, ETSI AMR1 and AMR2, in non-stationary noise in terms of the receiver operating characteristic (ROC) curves and the recognition rates from a practical distributed speech recognition (DSR) system.
The noise in digital images acquired by image sensors has complex characteristics due to the variety of noise sources. However, most noise reduction methods assume that an image has additive white Gaussian noise (AWGN) with a constant standard deviation, and thus such methods are not effective for use with image signal processors (ISPs). To efficiently reduce the noise in an ISP, we estimate a unified noise model for an image sensor that can handle shot noise, dark-current noise, and fixed-pattern noise (FPN) together, and then we adaptively reduce the image noise using an adaptive Smallest Univalue Segment Assimilating Nucleus ( SUSAN ) filter based on the unified noise model. Since our noise model is affected only by image sensor gain, the parameters for our noise model do not need to be re-configured depending on the contents of image. Therefore, the proposed noise model is suitable for use in an ISP. Our experimental results indicate that the proposed method reduces image sensor noise efficiently.
Masafumi KUMAMOTO Masahiro KIDA Ryotaro HIRAYAMA Yoshinobu KAJIKAWA Toru TANI Yoshimasa KURUMI
We propose an active noise control (ANC) system for reducing periodic noise generated in a high magnetic field such as noise generated from magnetic resonance imaging (MRI) devices (MR noise). The proposed ANC system utilizes optical microphones and piezoelectric loudspeakers, because specific acoustic equipment is required to overcome the high-field problem, and consists of a head-mounted structure to control noise near the user's ears and to compensate for the low output of the piezoelectric loudspeaker. Moreover, internal model control (IMC)-based feedback ANC is employed because the MR noise includes some periodic components and is predictable. Our experimental results demonstrate that the proposed ANC system (head-mounted structure) can significantly reduce MR noise by approximately 30 dB in a high field in an actual MRI room even if the imaging mode changes frequently.
Young-Geun LEE Han-Sam JUNG Ki-Seok CHUNG
Many DSP applications such as FIR filtering and DCT (discrete cosine transformation) require multiplication with constants. Therefore, optimizing the performance of constant multiplication improves the overall performance of these applications. It is well-known that shifting can replace a constant multiplication if the constant is a power of two. In this paper, we extend this idea in such a way that by employing more than two barrel shifters, we can design highly efficient constant multipliers. We have found that by using two or three shifters, we can generate a large set of constants. Using these constants, we can execute a typical set of FIR or DCT applications with few errors. Furthermore, with variable precision support, we can carry out a fairly large class of DSP applications with high computational efficiency. Compared to conventional multipliers, we can achieve power savings of up to 56% with negligible computational errors.
Miki SATO Akihiko SUGIYAMA Osamu HOSHUYAMA Nobuyuki YAMASHITA Yoshihiro FUJITA
This paper proposes near-field sound-source localization based on crosscorrelation of a signed binary code. The signed binary code eliminates multibit signal processing for simpler implementation. Explicit formulae with near-field assumption are derived for a two microphone scenario and extended to a three microphone case with front-rear discrimination. Adaptive threshold for enabling and disabling source localization is developed for robustness in noisy environment. The proposed sound-source localization algorithm is implemented on a fixed-point DSP. Evaluation results in a robot scenario demonstrate that near-field assumption and front-rear discrimination provides almost 40% improvement in DOA estimation. A correct detection rate of 85% is obtained by a robot in a home environment.
A very long instruction word (VLIW) digital signal processor (DSP), called ODiN, which could execute six instructions in a single cycle simultaneously, is designed and fabricated using 0.25 µm 1-ploy 5-metal standard cell static CMOS process. The ODiN core delivers maximum 600 MIPS with 100 MHz system clock. In order to achieve high performance operation, the designed core includes compact register files, orthogonal instruction set, single cycle operations for most instructions, and parallel processing based on software scheduling. In addition, a Viterbi decoder processor and a FFT processor that are embedded make it possible to implement software defined radio (SDR) applications efficiently.
Katsuhiko DEGAWA Takafumi AOKI Tatsuo HIGUCHI
This paper presents a Field-Programmable Digital Filter (FPDF) IC that employs carry-propagation-free redundant arithmetic algorithms for faster computation and multiple-valued current-mode circuit technology for high-density low-power implementation. The original contribution of this paper is to evaluate, through actual chip fabrication, the potential impact of multiple-valued current-mode circuit technology on the reduction of hardware complexity required for DSP-oriented programmable ICs. The prototype FPDF fabrication with 0.6 µm CMOS technology demonstrates that the chip area and power consumption can be reduced to 41% and 71%, respectively, compared with the standard binary logic implementation.
This paper presents a new DSP-oriented code optimization method to enhance performance by exploiting the specific architectural features of digital signal processors. In the proposed method, a source code is translated into the static single assignment form while preserving the high-level information related to loops and the address computation of array accesses. The information is used in generating hardware loop instructions and parallel instructions provided by most digital signal processors. In addition to the conventional control-data flow graph, a new graph is employed to make it easy to find auto-modification addressing modes efficiently. Experimental results on benchmark programs show that the proposed method is effective in improving performance.
A novel scheduling method for asynchronous multirate/multi-task processing by programmable digital signal processors (DSPs) has been developed. This mixed scheduling method combines static and dynamic scheduling, and avoids runtime overheads due to interrupts in context switching to realizes asynchronous multirate systems. The processing delay introduced when using static scheduling with static buffering is avoided by introducing deadline scheduling in the static schedule design. In the developed software design system, a block-diagram description language is extended to describe asynchronous multi-task processing. The scheduling method enables asynchronous multirate processing, such as arbitrary-sampling-ratio rate conversion, asynchronous interface, and multimedia applications, to be efficiently realized by programmable DSPs.
Nozomu TOGAWA Takashi SAKURAI Masao YANAGISAWA Tatsuo OHTSUKI
This letter proposes a hardware/software partitioning algorithm for digital signal processor cores with two register files. Given a compiled assembly code and a timing constraint of execution time, the proposed algorithm generates a processor core configuration with a new assembly code running on the generated processor core. The proposed algorithm considers two register files and determines the number of registers in each of register files. Moreover the algorithm considers two or more types of functional units for each arithmetic or logical operation and assigns functional units with small area to a processor core without causing performance penalty. A generated processor core will have small area compared with processor cores which have a single register file or those which consider only one type of functional units for each operation. The experimental results demonstrate the effectiveness and efficiency of the proposed algorithm.
Nozomu TOGAWA Yoshiharu KATAOKA Yuichiro MIYAOKA Masao YANAGISAWA Tatsuo OHTSUKI
Hardware/software partitioning is one of the key processes in a hardware/software cosynthesis system for digital signal processor cores. In hardware/software partitioning, area and delay estimation of a processor core plays an important role since the hardware/software partitioning process must determine which part of a processor core should be realized by hardware units and which part should be realized by a sequence of instructions based on execution time of an input application program and area of a synthesized processor core. This paper proposes area and delay estimation equations for digital signal processor cores. For area estimation, we show that total area for a processor core can be derived from the sum of area for a processor kernel and area for additional hardware units. Area for a processor kernel can be mainly obtained by minimum area for a processor kernel and overheads for adding hardware units and registers. Area for a hardware unit can be mainly obtained by its type and operation bit width. For delay estimation, we show that critical path delay for a processor core can be derived from the delay of a hardware unit which is on the critical path in the processor core. Experimental results demonstrate that errors of area estimation are less than 2% and errors of delay estimation are less than 2 ns when comparing estimated area and delay with logic-synthesized area and delay.
Naoki MIZUTANI Shogo MURAMATSU Hisakazu KIKUCHI
A unified polyphase representation of analysis and synthesis filter banks is introduced in this paper, and then the efficient implementation on digital signal processors (DSP) is investigated. Especially, the number of memory accesses, power consumption, processing accuracy and the required instruction cycles are discussed. Firstly, a unified representation is given, and then two types of procedures, SIMO system-based and MISO system-based procedures, are shown, where SIMO and MISO are abbreviations for single-input/multiple-output and multiple-input/single-output, respectively. These procedures are compared to each other. It is shown that the number of data load in SIMO system-based procedure is a half of that in MISO system-based procedure for two-channel filter banks. The implementation of M-channel filter banks is also discussed.
Kiyoshi NISHIKAWA Hitoshi KIYA
This paper proposes a fast implementation technique for RLS adaptive filters. The technique has an adjustable parameter to trade the throughput and the rate of convergence of the filter according to the applications. The conventional methods for improving the throughput do not have this kind of adjustability so that the proposed technique will expand the area of applications for the RLS algorithm. We show that the improvement of the throughput can be easily achieved by rearranging the formula of the RLS algorithm and that there are no need for faster PEs for the improvement.
Ulhaqsyed MOBIN Eiji HIRAKI Hiroshi TAKANO Mutsuo NAKAOKA
This paper describes an efficient simulation approach of a DSP controlled series-parallel resonant high frequency DC-DC power converter system. Proposed power conversion circuit simulation approach is based on a circuit equation, modeled by substituting time-varying switched resistor circuit in place of all the controllable and uncontrollable power semiconductor switching blocks of power converter circuits. An algebraic algorithm transforms the matrices of the circuit equation into the matrices of the state vector equation. Solution of state equation is by 3rd order Runge Kutta numerical integration method. Simulation results are illustrated and discussed together with experimental results.
Yasuo SUZUKI Kazuhiro UEHARA Masashi NAKATSUGAWA Yushi SHIRATO Shuji KUBOTA
Software radio base and personal station prototypes are proposed and implemented. The prototypes are composed of RF/IF, A/D and D/A, pre- and post-processors, CPU, and DSP parts. System software is partitioned into CPU program and DSP program to use processor resources effectively. They support various air interfaces, some of which are equivalent to the 384 kbit/s transmission rate PHS (personal handy phone system) and a 96 kbit/s transmission rate system. The base station can also be used as a communication bridge between two systems. In order to ease IF filter requirements, the zero-stuff method is employed. Basic transmission and receiving performances are evaluated in an experiment and their results agree well with those expected.
Yoshimasa NEGISHI Eiji WATANABE Akinori NISHIHARA Takeshi YANAGISAWA
Digital Signal Processors with complex arithmetic capability (DSP-C) are useful for various applications. In this paper, we propose a method for the effective implementation of specific circuits with real coefficients on DSP-C. DSP-C has special hardware such as a complex multiplier so that a complex calculation can be performed with only one instruction. First, we show that nodes with two real coefficient input branches can be implemented by complex multiplications. We apply this implementation to 2D circuits and transversal circuits with real coefficients. Next, we introduce a new computational mode (Advanced mode) and a new multiplier into PSI, a kind of DSP-C which has been proposed already, in order to process the circuits effectively. The effectiveness of the proposed method is shown by simulation in the last part.
Tetsuya MATSUMURA Hiroshi SEGAWA Satoshi KUMAKI Yoshinori MATSUURA Atsuo HANAMI Kazuya ISHIHARA Shin-ichi NAKAGAWA Tadashi KASEZAWA Yoshihide AJIOKA Atsushi MAEDA Masahiko YOSHIMOTO Tadashi SUMI
This paper describes a chip set architecture and its implementation for programmable MPEG2 MP@ML (main profile at main level) video encoder. The chip set features a functional partitioning architecture based on the MPEG2 layer structure. Using this partitioning scheme, an optimized system configuration with double bus structure is proposed. In addition, a hybrid architecture with dual video-oriented on-chip RISC processors and dedicated hardware and a hierarchical pipeline scheme covering all layers are newly introduced to realize flexibility. Also, effective motion estimation is achieved by a scalable solution for high picture quality. Adopting these features, three kinds of VLSI have been developed using 0. 5 micron double metal CMOS technology. The chip set consists of a controller-LSI (C-LSI), a macroblock level pixel processor-LSI (P-LSI) and a motion estimation-LSI (ME-LSI). The chip set combined with synchronous DRAMs (SDRAM) supports all the layer processing including rate-control and realizes real-time encoding for ITU-R-601 resolution video (720480 pixels at 30 frames/s) with glue less logic. The exhaustive motion estimation capability is scalable up to 63. 5 and 15. 5 in the horizontal and vertical directions respectively. This chip set solution realizes a low cost MPEG2 video encoder system with excellent video quality on a single PC extension board. The evaluation system and application development environment is also introduced.
Nobuhiko SUGINO Hironobu MIYAZAKI Akinori NISHIHARA
Many digital signal processors (DSPs) employ indirect addressing using address registers (ARs) to indicate their memory addresses, which often leads to overhead. This paper presents methods to efficiently allocate addresses for variables in a given program so that overhead in AR update operations is reduced. Memory addressing model is generalized in such a way that AR can be updated at the codes without memory accesses. An efficient memory address allocation is obtained by a method based on the graph linearization algorithm, which takes account of the number of possible AR update operations for every memory access. In order to utilize multiple ARs, methods to assign variables into ARs are also investigated. The proposed methods are applied to the compiler for µPD77230 (NEC) and generated codes for several examples prove effectiveness of these methods.
Eiichi TERAOKA Toru KENGAKU Ikuo YASUI Kazuyuki ISHIKAWA Takahiro MATSUO Hideyuki WAKADA Narumi SAKASHITA Yukihiko SHIMAZU Takeshi TOKUDA
Built-in self-test (BIST) has been applied to test an analog to digital converter (ADC) and a digital to analog converter (DAC) embedded in a DSP-core ASIC. The eight performance characteristics of the ADC and the DAC designed in accordance with the ITU-T recommendations are measured using the BIST. Three of the eight characteristics - the attenuation/frequency distortion, the variation of gain with input level, and the signal-to-total distortion - have been evaluated and the measured results have shown good agreement with measured results by conventional tests. In the BIST operation, the DSP-core generates input stimulus and analyzes output response by control of the self-test program, The sizes of the self-test program and coefficient data are 822 words of the IROM and 384 words of the data ROM, respectively. This area overhead is less than 0.5% of total chip area. Test-time by the BIST is reduced to approximately 3.2 seconds, which is one-tenth that of conventional testing. The mixed-signal DSP-core ASIC is testable with only logic test equipment, and as a result, test-cost - that is test investment and test-time - is reduced compared with conventional test methods.