Author Search Result

[Author] Jun TAKEDA(4hit)

1-4hit
  • A Method for Estimating the Mean-Squared Error of Distributed Arithmetic

    Jun TAKEDA  Shin-ichi URAMOTO  Masahiko YOSHIMOTO  

     
    PAPER-Digital Signal Processing

      Vol:
    E77-A No:1
      Page(s):
    272-280

    It is important for LSI system designers to estimate computational errors when designing LSI's for numeric computations. Both for the prediction of the errors at an early stage of designing and for the choice of a proper hardware configuration to achieve a target performance, it is desirable that the errors can be estimated in terms of a minimum of parameters. This paper presents a theoretical error analysis of multiply-accumulation implemented by distributed arithmetic(DA) and proposes a new method for estimating the mean-squared error. DA is a method of implementing the multiply-accumulation that is defined as an inner product of an input vector and a fixed coefficient vector. Using a ROM which stores partial products. DA calculates the output by accumulating the partial products bitserially. As DA uses no parallel multipliers, it needs a smaller chip area than methods using parallel multipliers. Thus DA is effectively utilitzed for the LSI implementation of a digital signal processing system which requires the multiply-accumulation. It has been known that, if the input data are uniformly distributed, the mean-squared error of the multiply-accumulation implemented by DA is a function of only the word lengths of the input, the output, and the ROM. The proposed method for the error estimation can calculate the mean-squared error by using the same parameters even when the input data are not uniformly distributed. The basic idea of the method is to regard the input data as a combination of uniformly distributed partial data with a different word length. Then the mean-squared error can be predicted as a weighted sum of the contribution of each partial data, where the weight is the ratio of the partial data to the total input data. Finally, the method is applied to a two-dimensional inverse discrete cosine transform (IDCT) and the practicability of the method is confirmed by computer simulations of the IDCT implemented by DA.

  • Optical Flow Detection System Using a Parallel Processor NEURO4

    Jun TAKEDA  Ken-ichi TANAKA  Kazuo KYUMA  

     
    PAPER

      Vol:
    E81-A No:3
      Page(s):
    439-445

    An image recognition system using NEURO4, a programmable parallel processor, is described. Optical flow is the velocity field that an observer detects on a two-dimensional image and gives useful information, such as edges, about moving objects. The processing time for detecting optical flow on the NEURO4 system was analyzed. Owing to the parallel computation scheme, the processing time on the NEURO4 system is proportional to the square root of the size of images, while conventional sequential computers need time in proportion to the size. This analysis was verified by experiments using the NEURO4 system. When the size of an image is 84 84, the NEURO4 system can detect optical flow in less than 10 seconds. In this case the NEURO4 system is 23 times faster than a workstation, Sparc Station 20 (SS20). The larger the size of images becomes, the faster the NEURO4 system can detect optical flow than conventional sequential computers like SS20. Furthermore, the paralleling effect increases in proportion to the number of connected NEURO4 chips by a ring expansion scheme. Therefore, the NEURO4 system is useful for developing moving image recognition algorithms which require a large amount of processing time.

  • A 100-MHz 2-D Discrete Cosine Transform Core Processor

    Shin-ichi URAMOTO  Yoshitsugu INOUE  Akihiko TAKABATAKE  Jun TAKEDA  Yukihiro YAMASHITA  Hideyuki TERANE  Masahiko YOSHIMOTO  

     
    PAPER

      Vol:
    E75-C No:4
      Page(s):
    390-397

    The discrete cosine transform (DCT) has been recognized as one of the standard techniques in image compression. Therefore, a core processor which rapidly computes DCT has become a key component in image compression VLSI's. This paper describes a 100-MHz two-dimensional DCT core processor which is applicable to the real-time processing of HDTV signals. An excellent architecture utilizing a fast DCT algorithm and multiplier accumulators based on distributed arithmetic have contributed to reducing the hardware amount and to enhancing the speed performance. A layout scheme with a column-interleaved memory and a new ROM circuit are introduced for the efficient implementation of memory-based signal processing circuits. Furthermore, mean values of errors generated in the core were minimized to enhance the computational accuracy with the word-length constraints. Consequently, it features the fastest operating speed and the smallest area with its sufficient accuracy satisfying the specifications in CCITT recommendation H.261. The core integrates about 102K transistors, and occupies 21 mm2 using 0.8-µm double-metal CMOS technology.

  • An MPEG2 Video Decoder LSI with Hierarchical Control Mechanism

    Shin-ichi URAMOTO  Akihiko TAKABATAKE  Takashi HASHIMOTO  Jun TAKEDA  Gen-ichi TANAKA  Tsuyoshi YAMADA  Yukio KODAMA  Atsushi MAEDA  Toshiaki SHIMADA  Shun-ichi SEKIGUCHI  Tokumichi MURAKAMI  Masahiko YOSHIMOTO  

     
    PAPER

      Vol:
    E78-C No:12
      Page(s):
    1697-1708

    An MPEG2 video decoder LSI fully compliant with MPEG2 main profile at main level is described. The video decoder LSI is a single chip solution which can implement MPEG2 video decoding with conventional DRAMs. The LSI features an architecture based on dedicated decoding hardware so as to gain the necessary computational power for real-time processing of ITU-R R.601 size video. The variable length decoder (VLD), owing to our "one symbol decoding in one cycle" policy and a special circuit for detecting unique startcodes, achieved bitstream decoding up to 18 Mbps with a normal decoding process. It also realized fast searching for the next start-code in the picture skipping and error recovery processes. The video decoder LSI also features a hierarchical and adaptive control mechanism. This control mechanism decreases the dead time of the decoding circuits and raises the efficiency of data transfer via the local DRAM port. It also contributes to the realization of error concealment and error recovery processes. This chip is capable of processing NTSC-resolution video depicted in MPEG2 MP@ML in real-time at 27 MHz operation. The chip integrates about 1200 K transistors using 0.5 µm double metal CMOS technology. The feature of the hardware based architecture results in a low power dissipation, and the chip consumes a 1.4 W of power at 3.3 V supply voltage and is housed in a plastic QFP.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.