1-12hit |
Tetsuya MATSUMURA Satoshi KUMAKI Hiroshi SEGAWA Kazuya ISHIHARA Atsuo HANAMI Yoshinori MATSUURA Stefan SCOTZNIOVSKY Hidehiro TAKATA Akira YAMADA Shu MURAYAMA Tetsuro WADA Hideo OHIRA Toshiaki SHIMADA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Koji TSUCHIHASHI Yasutaka HORIBA
A single-chip MPEG-2 video, audio, and system encoder LSI has been developed. It performs concurrent real-time processing of MPEG-2 422P@ML video encoding, 2-channel Dolby Digital or MPEG-1 audio encoding, and system encoding that generates a multiplexed transport stream (TS) or a program stream (PS). Advanced hybrid architecture, which combines a high performance VLIW media-processor D30V and hardwired video processing circuits, has been adopted to satisfy the demands of both high flexibility and enormous computational capability. A unified control scheme has been newly proposed that hierarchically manages adaptive task priority control over asynchronous video, audio, and system encoding processes in order to achieve real-time concurrent processing using a single D30V. Dual dedicated motion estimation cores consisting of a coarse ME core (CME) for wide range searches and a fine ME core (FME) for precise searches have been integrated to produce high picture quality while using a small amount of hardware. Adopting these features, a single-chip encoder has been fabricated using 0.25-micron 4-layer metal CMOS technology, and integrated into a 14.2 mm 14.2 mm die with 11 million transistors.
Hiroshi SEGAWA Yoshinori MATSUURA Satoshi KUMAKI Tetsuya MATSUMURA Stefan SCOTZNIOVSKY Shu MURAYAMA Tetsuro WADA Ayako HARADA Eiji OHARA Ken-ichi ASANO Toyohiko YOSHIDA Yasutaka HORIBA
This paper describes an embedded software scheme for a single-chip MPEG-2 encoder that executes concurrent video, audio, and system encoding in real-time. The software features a scalable module structure, which is hierarchically composed and has expandable plug-in modules. For increased applicability, several task-modules are prepared for the respective video, audio, and system processing. In addition, an effective task management scheme that features polling and interrupt-based task switching has been proposed in order to achieve real-time operation. The software having these features and including all task-modules is implemented on a single media-processor D30V on a single chip MPEG-2 video, audio, and system encoder. This encoder realizes real-time MPEG-2 video encoding, Dolby Digital or MPEG-1 audio encoding, and system encoding that generates TS or PS over 50 Mbps for various applications. Assuming a DVD or DTV encoder system, the software is reconstructed with less than 56.6-kbytes of instruction and 145.6 MIPS performance. The single media-processor with 64-kbytes of instruction RAM and 162 MIPS performance, running at a clock rate of 162 MHz, can successfully accomplish a real-time operation with the proposed embedded software.
Kousuke IMAMURA Ryota HONDA Yoshifumi KAWAMURA Naoki MIURA Masami URANO Satoshi SHIGEMATSU Tetsuya MATSUMURA Yoshio MATSUDA
The development of an extremely efficient packet inspection algorithm for lookup engines is important in order to realize high throughput and to lower energy dissipation. In this paper, we propose a new lookup engine based on a combination of a mismatch detection circuit and a linked-list hash table. The engine has an automatic rule registration and deletion function; the results are that it is only necessary to input rules, and the various tables included in the circuits, such as the Mismatch Table, Index Table, and Rule Table, will be automatically configured using the embedded hardware. This function utilizes a match/mismatch assessment for normal packet inspection operations. An experimental chip was fabricated using 40-nm 8-metal CMOS process technology. The chip operates at a frequency of 100MHz under a power supply voltage of VDD =1.1V. A throughput of 100Mpacket/s (=51.2Gb/s) is obtained at an operating frequency of 100MHz, which is three times greater than the throughput of 33Mpacket/s obtained with a conventional lookup engine without a mismatch detection circuit. The measured energy dissipation was a 1.58pJ/b·Search.
Yoshifumi KAWAMURA Naoya OKADA Yoshio MATSUDA Tetsuya MATSUMURA Hiroshi MAKINO Kazutami ARIMOTO
A Field Programmable Sequencer and Memory (FPSM), which is a programmable unit exclusively optimized for peripherals on a micro controller unit, is proposed. The FPSM functions as not only the peripherals but also the standard built-in memory. The FPSM provides easier programmability with a smaller area overhead, especially when compared with the FPGA. The FPSM is implemented on the FPGA and the programmability and performance for basic peripherals such as the 8 bit counter and 8 bit accuracy Pulse Width Modulation are emulated on the FPGA. Furthermore, the FPSM core with a 4K bit SRAM is fabricated in 0.18µm 5 metal CMOS process technology. The FPSM is an half the area of FPGA, its power consumption is less than one-fifth.
Akira YAMADA Toyohiko YOSHIDA Tetsuya MATSUMURA Shin-ichi URAMOTO Koji TSUCHIHASHI Edgar HOLMANN
Integrating a 243 MHz dual-issue RISC processor core with a small set of dedicated hardware can create a single chip system for real-time encoding and decoding for MPEG2 MP@ML (main profile at main level). A trade-off between software and dedicated hardware is very important to decide performance of the system. This paper evaluates several MPEG2 encoding and decoding systems, focusing on both chip area and power consumption. For MPEG2 encoding, a newly introduced hybrid approach includes the processor core and the dedicated hardware that performs the discrete cosine transform (DCT), the inverse DCT (IDCT), variable length encoding (VLC) and block loading process. The estimated area for the encoder, 23. 0 mm2 using a 0. 3-micrometer 1-poly 4-metal CMOS process, is 33% smaller than that of the dedicated hardware approach. The estimated power consumption for the encoder is 13% smaller than that of the dedicated hardware approach. The dual-issue RISC processor approach has the advantage of a small chip area, low power consumption and that of being very easy to program for multimedia applications.
Ayako HARADA Shin-ichi HATTORI Tadashi KASEZAWA Hidenori SATO Tetsuya MATSUMURA Satoshi KUMAKI Kazuya ISHIHARA Hiroshi SEGAWA Atsuo HANAMI Yoshinori MATSUURA Ken-ichi ASANO Toyohiko YOSHIDA Masahiko YOSHIMOTO Tokumichi MURAKAMI
An MPEG-2 422P@HL encoder chip set composed of a preprocessing LSI, an encoding LSI, and a motion estimation LSI is described. This chip set realizes a two-type scalability of picture resolution and quality, and executes a hierarchical coding control in the overall encoder system. Due to its scalable architecture, the chip set realizes a 422P@HL video encoder with multi-chip configuration. This single encoding LSI achieves 422P@ML video, audio, and system encoding in real time. It employs an advanced hybrid architecture with a 162 MHz media processor and dedicated video processing hardware. It also has dual communication ports for parallel processing with multi-chip configuration. Transferring of reconstructed data and macroblock characteristic data between neighboring encoder modules is executed via these ports. The preprocessing LSI is fabricated using 0.25 micron three-layer metal CMOS technology and integrates 560 K gates in an area of 12.0 mm 12.0 mm . The encoding LSI is fabricated using 0.25 micron four-layer metal CMOS technology and integrates 11 million transistors in an area of 14.2 mm 14.2 mm . The motion estimation LSI is fabricated using 0.35 micron three-layer metal CMOS technology. It integrates 1.9 million transistors in an area of 8.5 mm 8.5 mm . This chip set makes various system configurations possible and allows for a compact and cost-effective video encoder with high picture quality.
Yu SUZUKI Masato ITO Satoshi KANDA Kousuke IMAMURA Yoshio MATSUDA Tetsuya MATSUMURA
This paper describes the design and implementation of a real-time optical flow processor using a single field-programmable gate array (FPGA) chip. By introducing the modified initial flow generation method, the successive over-relaxation (SOR) method for both layers, the optimization of the reciprocal operation method, and the image division method, it is now possible to both reduce hardware requirements and improve flow accuracy. Additionally, by introducing a pipeline structure to this processor, high-throughput hardware implementation could be achieved. Total logic cell (LC) amounts and processer memory capacity are reduced by about 8% and 16%, respectively, compared to our previous hierarchical optical flow estimation (HOE) processor. The results of our evaluation confirm that this processor can perform 30 fps wide extended graphics array (WXGA) 175.7MHz real-time optical flow processing with a single FPGA.
Masahiko YOSHIMOTO Shin-ichi NAKAGAWA Tetsuya MATSUMURA Kazuya ISHIHARA Shin-ichi URAMOTO
This paper will describe an overview on several design issues and solutions for the realization of MPEG2 encoder &decoder LSIs. ULSI technology and video-coding specific design have been able to actualize an MPEG2 encoder &decoder LSI with realtime capability, flexibility and cost effectiveness, though MPEG2 processing at MP@ML (Main Profile and Main Level) requires an enormous computation power of 10-200 GOPS depending on the motion estimation algorithm and a search range. Video coding processors, whose performance has been enhanced at the rate of one order per 3 years, have reached the performance level required to implement MPEG2 encoding using multiple chip configuration. This has been achieved by a hybrid architecture with video-oriented RISC and hardware engine optimized for coding algorithms. Intensive circuit optimization was carried out for transform coding such as DCT and predictive coding with motion estimation. Now cost effective MPEG2 decoders have begun to penetrate the multimedia market. There are two main design issues. One is the architectural and circuit design which minimizes the silicon area and power dissipation. The other is external DRAM control which makes use of DRAM storage and band width efficiently to reduce the system cost. Also future trends in a deep submicron era will be discussed. A single chip MPEG2 MP@ML encoder is expected to appear in the 0.25 micron era at the latest. An MPEG2 MP@ML decoder could be compressed to an area of about 25 mm2.
Hidehiro TAKATA Rei AKIYAMA Tadao YAMANAKA Haruyuki OHKUMA Yasue SUETSUGU Toshihiro KANAOKA Satoshi KUMAKI Kazuya ISHIHARA Atsuo HANAMI Tetsuya MATSUMURA Tetsuya WATANABE Yoshihide AJIOKA Yoshio MATSUDA Syuhei IWADE
An on-chip, 64-Mb, embedded, DRAM MPEG-2 encoder LSI with a multimedia processor has been developed. To implement this large-scale and high-speed LSI, we have developed the hierarchical skew control of multi-clocks, with timing verification, in which cross-talk noise is considered, and simple measures taken against the IR drop in the power lines through decoupling capacitors. As a result, the target performance of 263 MHz at 1.5 V has been successfully attained and verified, the cross-talk noise has been considered, and, in addition, it has become possible to restrain the IR drop to 166 mV in the 162 MHz operation block.
Saya OHIRA Naoki TSUCHIYA Tetsuya MATSUMURA
We propose a three-dimensional (3D) sound processor architecture that includes super-directional modulation intellectual property (IP) and 3D sound processing IP and for consumer applications. In addition, we also propose an automatic design environment for 3D sound processing IP. This processor can generate realistic small sound fields in arbitrary spaces using ultrasound. In particular, in the 3D sound processing IP, in order to reproduce 3D audio, it is necessary to reproduce the personal frequency characteristics of complex head related transfer functions. For this reason, we have constructed an automatic design environment with high reconfigurability. This automatic design environment is based on high-level synthesis, and it is possible to automatically generate a C-based algorithm simulator and automatically synthesize the IP hardware by inputting a parameter description file for filter design. This automatic design environment can reduce the design period to approximately 1/5 as compared with conventional manual design. Applying the automatic design environment, a 3D sound processing IP was designed experimentally. The designed IP can be sufficiently applied to consumer applications from the viewpoints of hardware amount and power consumption.
Tetsuya MATSUMURA Masahiko YOSHIMOTO Atsushi MAEDA Yasutaka HORIBA
This paper describes a high-performance reconfigurable line memory macrocell for video signal processing ASICs. The macrocell features a three-transistor memory cell array with a divided word line structure for write word lines. The transistor size of the memory cell has been determined by analyzing access time to achieve a more than 50 MHz throughput rate for various aspect ratios. A testing circuit has been embedded in the macrocell, which offers the video-rate testing and high fault coverage with a minimum circuit count. Moreover the macrocell has high reconfigurability of word-length, bit-width and aspect ratio. A 1152 words8 bits line memory has been implemented experimentally using 1.0 µm CMOS technology. As a result, 60 MHz operation has been observed, allowing real time processing of HDTV signal. By applying the macrocells to HDTV system LSIs, the reconfigurability and usefulness of the testing circuits have been verified.
Tetsuya MATSUMURA Hiroshi SEGAWA Satoshi KUMAKI Yoshinori MATSUURA Atsuo HANAMI Kazuya ISHIHARA Shin-ichi NAKAGAWA Tadashi KASEZAWA Yoshihide AJIOKA Atsushi MAEDA Masahiko YOSHIMOTO Tadashi SUMI
This paper describes a chip set architecture and its implementation for programmable MPEG2 MP@ML (main profile at main level) video encoder. The chip set features a functional partitioning architecture based on the MPEG2 layer structure. Using this partitioning scheme, an optimized system configuration with double bus structure is proposed. In addition, a hybrid architecture with dual video-oriented on-chip RISC processors and dedicated hardware and a hierarchical pipeline scheme covering all layers are newly introduced to realize flexibility. Also, effective motion estimation is achieved by a scalable solution for high picture quality. Adopting these features, three kinds of VLSI have been developed using 0. 5 micron double metal CMOS technology. The chip set consists of a controller-LSI (C-LSI), a macroblock level pixel processor-LSI (P-LSI) and a motion estimation-LSI (ME-LSI). The chip set combined with synchronous DRAMs (SDRAM) supports all the layer processing including rate-control and realizes real-time encoding for ITU-R-601 resolution video (720480 pixels at 30 frames/s) with glue less logic. The exhaustive motion estimation capability is scalable up to 63. 5 and 15. 5 in the horizontal and vertical directions respectively. This chip set solution realizes a low cost MPEG2 video encoder system with excellent video quality on a single PC extension board. The evaluation system and application development environment is also introduced.