IEICE TRANSACTIONS on Electronics

  • Impact Factor

    0.63

  • Eigenfactor

    0.002

  • article influence

    0.1

  • Cite Score

    1.3

Advance publication (published online immediately after acceptance)

Volume E77-C No.7  (Publication Date:1994/07/25)

    Special Issue on Super Chip for Intelligent Integrated Systems
  • FOREWORD

    Michitaka KAMEYAMA  

     
    FOREWORD

      Page(s):
    1021-1022
  • Overview of the Super Database Computer (SDC-I)

    Masaru KITSUREGAWA  Weikang YANG  Satoshi HIRANO  Masanobu HARADA  Minoru NAKAMURA  Kazuhiro SUZUKI  TaKayuki TAMURA  Mikio TAKAGI  

     
    INVITED PAPER

      Page(s):
    1023-1031

    This paper presents an overview of the SDC-I (Super Database Computer I) developed at the University of Tokyo, Japan. The purpose of the project was to build a high performance SQL server which emphasizes query processing over transaction processing. Recently relational database systems tend to be used for heavy decision support queries, which include many join, aggregation, and order-by operations. At present high-end mainframes are used for these applications requiring several hours in some cases. While the system architecture for high traffic transaction processing systems is well established, that for adhoc query processing has not yet adequately understood. SDC-I proved that a parallel machine could attain significant performance improvements over a coventional sequential machine through the exploitation of the high degree of parallelism present in relational query processing. A unique bucket spreading parallel hash join algorithm is employed in SDC, which makes the system very robust in the presense of data skew and allows SDC to attain almost linear performance scalability. SDC adopts a hybrid parallel architecture, where globally it is a shared nothing architecture, that is, modules are connected through the multistage network, but each module itself is a symmetric multiprocessor system. Although most of the hardware elements use commodity microprocessors for improved performance to cost, only the interconnection network incorporates the special function to support our parallel relational algorithm. Data movement over the memory and the network, rather than computation, is heavy for I/O intensive database processing. A dedicated software system was carefully designed for efficient data movement. The implemented prototype consists of two modules. Its hardware and software organization is described. The performance monitoring tool was developed to visualize the system activities, which showed that SDC-I works very efficiently.

  • The Concept of Four-Terminal Devices and Its Significance in the Implementation of Intelligent Integrated Circuits

    Tadahiro OHMI  Tadashi SHIBATA  

     
    PAPER

      Page(s):
    1032-1041

    It is demonstrated that the enhancement in the functional capability of an elemental transistor is quite essential in developing human-like intelligent electronic systems. For this purpose we have introduced the concept of four-terminal devices. Four-terminal devices have an additional dimension in the degree of freedom in controlling currents as compared to the three-terminal devices like bipolar and MOS transistors. The importance of the four-terminal device concept is demonstrated taking the neuron MOS transistor (abbreviated as neuMOS or νMOS) and its circuit applications as examples. We have found that any Boolean functin can be realized by a two-stage configuratin of νMOS inverters. In addition, the variable threshold nature of the device allows us to build real-time reconfigurable logic circuits (no floating gate charging effect is involved in varying the threshold). Based on the principle, we have developed Soft-Hardware Logic Circuits and Real-Time Rule-Variable Data Matching Circuits. A winner-take-all circuit which finds the largest signal by hardware parallel processing has been also developed. The circuit is applied to building an associative memory which is different from Hopfield network in both principle and operation. The hardware algorithm in which binary, multivalue, and analog operations are merged at a very device level is quite essential to establish intelligent information processing systems based on highly flexible, real-time programmable hardwares realized by four-terminal devices.

  • Low-Power 8-Valued Cellular Array VLSI for High-Speed Image Processing

    Takahiro HANYU  Maho KUWAHARA  Tatsuo HIGUCHI  

     
    PAPER

      Page(s):
    1042-1048

    This paper presents a low-power 8-valued cellular array VLSI for high-speed image processing based on logical neighborhood operations with 33 windows. This array is useful for performing low-level image processing such as noise removal and edge detection, in intelligent integrated systems where immediate response to input change as well as high throughput is needed. In order to achieve high-speed image processing, template matching for neighborhood operations can be performed in parallel on each row. Each row of the image is operated in a pipelining manner. The direct 8-valued encoding of the matched results for three different 33 masks makes it possible to reduce the number of operations by one-third. In the hardware implementation, the matching cell for logical neighborhood operations can be implemented compactly using MOS transistors with different threshold voltage, which are programmed by multiple ion implants. Moreover, a new literal circuit for detecting multiple-valued signals using a dynamic design style eliminates hazards due to timing skews in the difference of various input voltage levels, so that the dynamic power dissipation of the proposed circuit is greatly reduced. Finally, it is demonstrated that the processing time of the proposed cellular array is reduced to about 40 percent in comparison with that of a corresponding binary circuit when power dissipation/area = 0.3 W/100 mm2.

  • A Discrete Fourier Analyzer Based on Analog VLSI Technology

    Shoji KAWAHITO  Kazuyuki TAKEDA  Takanori NISHIMURA  Yoshiaki TADOKORO  

     
    PAPER

      Page(s):
    1049-1056

    This paper presents a discrete Fourier analyzer using analog VLSI technology. An analog current-mode technique is employed for implementing it by a regular array structure based on the straight-forward discrete Fourier transform (DFT) algorithm. The basic components are 1-dimensional (1-D) analog current-mode multiplier array for fixed coefficient multiplication, two-dimensional (2-D) analog switch array and wired summations. The proposed scheme can process speedily N-point DFT in a time proportional to N. Possibility of the realization of the analog DFT VLSI based on 1 µm technology is discussed from the viewpoints of precision, speed, area, and power dissipation. In the case of 1024-point DFT, the standard deviation of the total error is estimated to be about 2%, the latency, or processing time is about 110 µs, and the signal sample rate based on a pipeline manner is about 4.7 MHz. A prototype MOS integrated circuit of the 16-point multiplier array has been implemented and a typical operation using the multiplier array has been confirmed.

  • Quantizer Neuron Chip (QNC) with Multichip Extendable Architecture

    Masakatsu MARUYAMA  Hiroyuki NAKAHIRA  Shiro SAKIYAMA  Toshiyuki KOHDA  Susumu MARUNO  Yasuharu SHIMEKI  

     
    PAPER

      Page(s):
    1057-1064

    This paper discusses a digital neuroprocessor named Quantizer Neuron Chip (QNC) employing the Quantizer Neuron model and two newly developed schemes; "concurrent processing of quantizer neuron" and "removal of ineffective calculations". QNC simulates neural networks named the Multi-Functional Layered Network (MFLN) with 64 output neurons, 4672 quantizer neurons and two million synaptic weights and can be used for character or image recognition and learning. The processing speed of the chip achieved 1.6 µseconds per output neuron for recognition and 20 million connections updated per second (MCUPS) for learning. In addition, QNC can execute multichip operation for increasing the size of networks. We applied QNC to handwritten numeral recognition and realized high speed recognition and learning. QNC is implemented in a 1.2 µm double metal CMOS with sea of gates' technology and contains 27,000 gates on a 10.9910.93 mm2 chip.

  • A Memory-Based Recurrent Neural Architecture for Chip Emulating Cortical Visual Processing

    Luigi RAFFO  Silvio P. SABATINI  Giacomo INDIVERI  Giovanni NATERI  Giacomo M. BISIO  

     
    PAPER

      Page(s):
    1065-1074

    The paper describes the architecture and the simulated performances of a memory-based chip that emulates human cortical processing in early visual tasks, such as texture segregation. The featural elements present in an image are extracted by a convolution block and subsequently processed by the cortical chip, whose neurons, organized into three layers, gain relational descriptions (intelligent processing) through recurrent inhibitory/excitatory interactions between both inter-and intra-layer parallel pathways. The digital implementation of this architecuture directly maps the set of equations determining the status of the cortical network to achieve an optimal exploitation of VLSI technology in neural computation. Neurons are mapped into a memory matrix whose elements are updated through a programmable computational unit that implements synaptic interconnections. By using 0.5 µm-CMOS technology, full cortical image processing can be attained on a single chip (2020 mm2 die) at a rate higher than 70 frames/second, for images of 256256 pixels.

  • 7.5 MFLIPS Fuzzy Microprocessor Using SIMD and Logic-in-Memory Structure

    Mamoru SASAKI  Fumio UENO  

     
    PAPER

      Page(s):
    1075-1082

    A fuzzy microprocessor is developed using 1.2 µm CMOS process. The inference scheme for the if-then fuzzy rules consists of three main steps i. e. if-part process, then-part process and defuzzification. In order to realize very high-speed inference and moderate programmability, we introduce three-type different structures i.e. SIMD, logic-in-memory and Wallace tree structures which are suitable for the three main steps. The inference speed including defuzzification is 7.5 MFLIPS which is 12.9 times higher than the previous VLSI implementation, and it can carry out many rules (960 rules) and many input and output variables (16 variables).

  • Graceful Degradation for Multiprocessor Realization of Maximally Flat FIR Digital Filters

    Saed SAMADI  Akinori NISHIHARA  Nobuo FUJII  

     
    PAPER

      Page(s):
    1083-1091

    In this paper we propose a method for increasing the reliability in multiprocessor realization of lowpass and highpass FIR digital filters possessing a maximally flat magnitude response. This method is based on the use of array realization of the filter which has been proposed earlier by the authors. It is shown that if a processing module of the array functions erroneously, it is possible to exclude the module and still obtain a lowpass FIR filter. However, as a price we should tolerate a slight degradation in the magnitude response of the filter that is equivalent to a wider transition band. We also analyze the behavior of the filter when our proposed schemes are implemented on more than one module. The justification of our approach is based on that a slight degradation of the spectral characteristics of a filter may be well tolerated in most filtering applications and thus a graceful degradation in the frequency domain can sufficiently reduce the vulnerability to errors.

  • Performance Evaluation of a Processing Element for an On-Chip Multiprocessor

    Masafumi TAKAHASHI  Hiroshige FUJII  Emi KANEKO  Takeshi YOSHIDA  Toshinori SATO  Hiroyuki TAKANO  Haruyuki TAGO  Seigo SUZUKI  Nobuyuki GOTO  

     
    PAPER

      Page(s):
    1092-1100

    A 250-MIPS, 125-MFLOPS peak performance processing element (PE), which is being developed for an on-chip multiprocessor, has been modeled and evaluated. The PE includes the following new architecture components: an FPU shared by several IUs in order to increase the efficiency of the FPU pipelines, an on-chip data cache with a prefetch mechanism to reduce clock cycles waiting for memory, and an interface to high speed DRAM, such as Rambus DRAM and Synchronous DRAM. As a result, a PE model with an FPU shared by four or eight IUs causes only 10% performance reduction compared to a model with an un-shared FPU model while saving the cost of three FPUs. Furthermore, a PE model with prefetch operates 1.2 to 1.8 times faster than a model without prefetch at 250-MHz clock rate when the Rambus DRAM is connected. It becomes clear that this PE architecture can bring a high effective performance at over 250-MHz, and is cost-effective for the on-chip multiprocessor.

  • High-Level Synthesis of VLSI Processors for Intelligent Integrated Systems

    Yasuaki SAWANO  Bumchul KIM  Michitaka KAMEYAMA  

     
    PAPER

      Page(s):
    1101-1107

    In intelligent integrated systems such as robotics for autonomous work, it is essential to respond to the change of the environment very quickly. Therefore, the development of special-purpose VLSI processors for intelligent integrated systems with small latency becomes an very important subject. In this paper, we present a scheduling algorithm for high-level synthesis. The input to the scheduler is a behavioral description which is viewed as a data flow graph (DFG). The scheduler minimizes the latency, which is the delay of the critical path in the DFG, and minimizes the number of functional units and buses by improving the utilization rates. By using an integer linear programming, the scheduler optimally assigns nodes and arcs in the DFG into steps.

  • Design of a CAM-Based Collision Detection VLSI Processor for Robotics

    Masanori HARIYAMA  Michitaka KANEYAMA  

     
    PAPER

      Page(s):
    1108-1115

    Real-time collision detection is one of the most important intelligent processings in robotics. In collision detection, a large storage capasity is usually required to store the 3-dimensional information on the obstacles located in a workspace. Moreover, high-computational power is essential in not only coordinate transformation but also matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a content-addressable memory (CAM). A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be performed by only magnitude comparison in parallel. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed, and each PE performs coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. When the 16 PEs and 144-kb CAM are used, the performance is evaluated to be 90 ms.

  • A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition

    Yoshifumi SASAKI  Michitaka KAMEYAMA  

     
    PAPER

      Page(s):
    1116-1122

    In robot vision system, enormously large computation power is required to perform three-dimensional (3-D) instrumentation and object recognition. However, many kinds of complex and irregular operations are required to make accurate 3-D instrumentation and object recognition in the conventional method for software implementation. In this paper, a VLSI-oriented Model-Based Robot Vision (MBRV) processor is proposed for high-speed and accurate 3-D instrumentation and object recognition. An input image is compared with two-dimensional (2-D) silhouette images which are generated from the 3-D object models by means of perspective projection. Because the MBRV algorithm always gives the candidates for the accurate 3-D instrumentation and object recognition result with simple and regular procedures, it is suitable for the implementation of the VLSI processor. Highly parallel architecture is employed in the VLSI processor to reduce the latency between the image acquisition and the output generation of the 3-D instrumentation and object recognition results. As a result, 3-D instrumentation and object recognition can be performed 10000 times faster than a 28.5 MIPS workstation.

  • Design of a Reconfigurable Parallel Processor for Digital Control Using FPGAs

    Yoshichika FUJIOKA  Michitaka KAMEYAMA  Nobuhiro TOMABECHI  

     
    PAPER

      Page(s):
    1123-1130

    In digital control, it is essential to make the delay time for a large number of multiply-additions small because of sensor feedback. To meet the requirement, an architecture of the reconfigurable parallel processor using field-programmable gate arrays (FPGAs) is proposed. Although the performance is drastically increased in the full custom VLSI implementation, even the reconfigurable parallel processor using FPGAs becomes useful for many practical digital control applications. The performance evaluation shows that the delay time for the resolved acceleration cotrol computation of a twelve-degrees-of-freedom (DOF) redundant manipulator becomes about 70 µs which is about seventeen times faster than that of a parallel processor approach using conventional digital signal processors (DSPs).

  • Regular Section
  • Ultimate Lower Bound of Power for MOS Integrated Circuits and Their Applications

    Kunihiro ASADA  Mike LEE  

     
    PAPER-Integrated Electronics

      Page(s):
    1131-1137

    The ultimate minimum energy of switching mechanism for MOS integrated circuits have been studied. This report elucidates the evaluation methods for minimum switching energy of instantaneous discharged mechanism after charging one, namely, recycled energy of the MOS device. Two approaches are implemented to capture this concept. One is a switching energy by the time-dependent gate capacitance (TDGC) model ; the other one by results developed by transient device simulation, which was implemented using Finite Element Method (FEM). It is understood that the non-recycled minimum swhiching energies by both approaches show a good agreement. The recycled energies are then calculated at various sub-micron gate MOS/SOI devices and can be ultra-low power of the MOS integrated circuits, which may be possible to build recycled power circuitry for super energy-saving in the future new MOS LSI. From those results, (1) the TDGC is simultaneously verified by consistent match of the non-recycled minimum switching energies; (2) the recycled switching energy is found to be the ultimate lower bound of power for MOS device; (3) the recycled switching energy can be saved up to around 80% of that of current MOS LSI.

  • Amplification Characteristics of Waveguide Type Optical Amplifier Using Nd Doped Garnet Thin Film

    Mitsuhiro WADA  Yasumitsu MIYAZAKI  

     
    PAPER-Opto-Electronics

      Page(s):
    1138-1145

    This paper proposes a waveguide type optical amplifier which is constructed in 1.3 at.% Nd doped yttrium gallium garnet thin films deposited on yttrium aluminum garnet substrates by using RF sputtering . The crystalline thin film with a satisfactory stoichiometric composition is obtained by annealing at 1000 after depositing at 600. The spectral properties and the optical amplification characteristics of the thin film waveguide are measured. Optical propagation loss of 2.2 dB/cm is achieved at the wavelength of 1061.5 nm. Absorption peak of the thin film is located at 808 nm, and fluorescence peaks at about 1.06 µm and 1.3 µm suggest the possibility of optical amplification. For the wavelength of 1061.5 nm, a maximum gain of 4.4 dB and a S/N ratio of 12.6 dB are obtained at a signal power of 10 µW and the pump power of about 14 mW. The pump efficiency is more than 0.3 dB/mW.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.