Author Search Result

[Author] Yoshinori TAKEUCHI(32hit)

1-20hit(32hit)

  • Effectiveness of a High Speed Context Switching Method Using Register Bank

    Jun-ichi ITO  Takumi NAKANO  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER-LSI Architecture

      Vol:
    E81-A No:12
      Page(s):
    2661-2667

    This paper proposes a method to reduce the context switching time using a register bank to store contexts of working tasks. Hardware cost and performance were measured by modeling the register bank and controller in VHDL. Following results were obtained: (1) The controller can be implemented with a much smaller amount of hardware cost compared to that of the register bank, which is realized by SRAM module. (2) Context switching time can be reduced to less than 50% compared to that by software implementation. (3) Combination of the proposed architecture with our previous work (RTOS implemented in HW) gives us much higher performance of a hard real-time system.

  • Acceleration of Genetic Programming by Hierarchical Structure Learning: A Case Study on Image Recognition Program Synthesis

    Ukrit WATCHAREERUETAI  Tetsuya MATSUMOTO  Noboru OHNISHI  Hiroaki KUDO  Yoshinori TAKEUCHI  

     
    PAPER-Artificial Intelligence and Cognitive Science

      Vol:
    E92-D No:10
      Page(s):
    2094-2102

    We propose a learning strategy for acceleration in learning speed of genetic programming (GP), named hierarchical structure GP (HSGP). The HSGP exploits multiple learning nodes (LNs) which are connected in a hierarchical structure, e.g., a binary tree. Each LN runs conventional evolutionary process to evolve its own population, and sends the evolved population into the connected higher-level LN. The lower-level LN evolves the population with a smaller subset of training data. The higher-level LN then integrates the evolved population from the connected lower-level LNs together, and evolves the integrated population further by using a larger subset of training data. In HSGP, evolutionary processes are sequentially executed from the bottom-level LNs to the top-level LN which evolves with the entire training data. In the experiments, we adopt conventional GPs and the HSGPs to evolve image recognition programs for given training images. The results show that the use of hierarchical structure learning can significantly improve learning speed of GPs. To achieve the same performance, the HSGPs need only 30-40% of the computation cost needed by conventional GPs.

  • Heuristic Instruction Scheduling Algorithm Using Available Distance for Partial Forwarding Processor

    Takuji HIEDA  Hiroaki TANAKA  Keishi SAKANUSHI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER-Embedded, Real-Time and Reconfigurable Systems

      Vol:
    E92-A No:12
      Page(s):
    3258-3267

    Partial forwarding is a design method to place forwarding paths on a part of processor pipeline. Hardware cost of processor can be reduced without performance loss by partial forwarding. However, compiler with the instruction scheduler which considers partial forwarding structure of the target processor is required since conventional scheduling algorithm cannot make the most of partial forwarding structure. In this paper, we propose a heuristic instruction scheduling method for processors with partial forwarding structure. The proposed algorithm uses available distance to schedule instructions which are suitable for the target partial forwarding processor. Experimental results show that the proposed method generates near-optimal solutions in practical time and some of the optimized codes for partial forwarding processor run in the shortest time among the target processors. It also shows that the proposed method is superior to hazard detection unit.

  • Code Efficiency Evaluation for Embedded Processors

    Morgan Hirosuke MIKI  Mamoru SAKAMOTO  Shingo MIYAMOTO  Yoshinori TAKEUCHI  Toyohiko YOSHIDA  Isao SHIRAKAWA  

     
    PAPER

      Vol:
    E85-A No:4
      Page(s):
    811-818

    This paper evaluates the code efficiency of the ARM, Java, and x86 instruction sets by compiling the SPEC CPU95/CPU2000/JVM98 and CaffeineMark benchmarks, from the aspects of code sizes, basic block sizes, instruction distributions, and average instruction lengths. As a result, mainly because (i) the Java architecture is a stack machine, (ii) there are only four local variables which can be accessed by a 1-byte instruction, and (iii) additional instructions are provided for the network security, the code efficiency of Java turns out to be inferior to that of ARM Thumb. Moreover, through this efficiency analysis it should be stressed that there exists the high potential of constructing a more efficient code architecture by taking minute account of the customization of an instruction set as well as the number of registers.

  • FOREWORD Open Access

    Yoshinori TAKEUCHI  

     
    FOREWORD

      Vol:
    E105-A No:3
      Page(s):
    436-436
  • Two Dimensional Space Partition Recursive Filtering Algorithm on Rectangular Processor Array

    Yoshinori TAKEUCHI  Hiroaki KUNIEDA  

     
    PAPER-Digital Signal Processing

      Vol:
    E74-A No:1
      Page(s):
    42-48

    This paper studies the method of parallel processing for two dimensional recursive filters on a multiprocessor system. Conventional recursive filterings are sequential and iterative local, i.e. global processing. We decompose their global processings into space partition processings with a few global communications. We derive an efficient parallel algorithm for two dimensional recursive filterings using Roesser's model and investigate their speed-up, rate, efficiency and degradation.

  • An Optimization Algorithm for High Performance ASIP Design with Considering the RAM and ROM Sizes

    Nguyen Ngoc BINH  Masaharu IMAI  Yoshinori TAKEUCHI  

     
    PAPER-Co-design

      Vol:
    E81-A No:12
      Page(s):
    2612-2620

    In designing ASIPs (Application Specific Integrated Processors), the papers investigated so far have almost focused on the optimization of the CPU core and did not pay enough attention to the optimization of the RAM and ROM sizes together. This paper overcomes this limitation and proposes an optimization algorithm to define the best ratio between the CPU core, RAM and ROM of an ASIP chip to achieve the highest performance while satisfying design constraints on the chip area. The partitioning problem is formalized as a combinatorial optimization problem that partitions the operations into hardware and software so that the performance of the designed ASIP is maximized under given chip area constraint, where the chip area includes the HW cost of the register file for a given application program with associated input data set. The optimization problem is parameterized so that it can be applied with different technologies to synthesize CPU cores, RAMs or ROMs. The experimental results show that the proposed algorithm is found to be effective and efficient.

  • Parallel Processing Architecture Design for Two-Dimensional Image Processing Using Spatial Expansion of the Signal Flow Graph

    Tsuyoshi ISSHIKI  Yoshinori TAKEUCHI  Hiroaki KUNIEDA  

     
    PAPER

      Vol:
    E76-A No:3
      Page(s):
    337-348

    In this paper, a methodology for designing the architecture of the processor array for wide class of image processing algorithms is proposed. A concept of spatially expanding the SFG description which enables us to handle the problem as merely one-dimensional signal processing is used in constructing the methodology. Problem of I/O interface which is critical in real-time processing is also considered.

  • Synthesizable HDL Generation for Pipelined Processors from a Micro-Operation Description

    Makiko ITOH  Yoshinori TAKEUCHI  Masaharu IMAI  Akichika SHIOMI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    394-400

    A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.

  • Segmentation of Depth-of-Field Images Based on the Response of ICA Filters

    Andre CAVALCANTE  Allan Kardec BARROS  Yoshinori TAKEUCHI  Noboru OHNISHI  

     
    LETTER-Image Recognition, Computer Vision

      Vol:
    E95-D No:4
      Page(s):
    1170-1173

    In this letter, a new approach to segment depth-of-field (DoF) images is proposed. The methodology is based on a two-stage model of visual neuron. The first stage is a retinal filtering by means of luminance normalizing non-linearity. The second stage is a V1-like filtering using filters estimated by independent component analysis (ICA). Segmented image is generated by the response activity of the neuron measured in terms of kurtosis. Results demonstrate that the model can discriminate image parts in different levels of depth-of-field. Comparison with other methodologies and limitations of the proposed methodology are also presented.

  • Separation of Narrow Bandwidth Spectral Light from Femtosecond Pulses Using Optical Coupler with Fiber Grating

    Asako BABA  Hitomi MORIYA  Shin-ichi WAKABAYASHI  Yukio TOYODA  Yoshinori TAKEUCHI  

     
    PAPER-Fibers

      Vol:
    E83-C No:6
      Page(s):
    824-829

    We have developed spectral separation devices for processing femtosecond pulses. These devices are based on an optical coupler structure with fiber gratings. In a computer simulation, we confirmed that these devices could extract <1 nm bandwidth light with 80% efficiency. We fabricated the spectral separation devices using single mode fibers and highly Ge-doped fibers. These devices successfully extracted narrow spectral light of 0.3 nm bandwidth with 37% efficiency from femtosecond pulses of 40 nm bandwidth. We also fabricated 2-channel spectral separation devices, which could extract the light from each grating channel.

  • Optical Encoding and Decoding of Femtosecond Pulses in the Spectral Domain Using Optical Coupler with Fiber Gratings

    Shin-ichi WAKABAYASHI  Hitomi MORIYA  Asako BABA  Yoshinori TAKEUCHI  

     
    PAPER-OTDM Transmission System, Optical Regeneration and Coding

      Vol:
    E85-C No:1
      Page(s):
    135-140

    We have developed optical encoding devices for processing femtosecond pulses. These devices are based on spectral separation devices and light modulators with fiber gratings. Experiments were made to encode a light pulse in the spectral domain. These experiments utilize the characteristics that a femtosecond light pulse has a very broad spectrum. An input femtosecond light pulse is decomposed into a series of wavelength components. Each wavelength component with narrow spectra <1 nm width is successfully extracted into a single mode fiber. Light modulators corresponding to wavelength components are assigned to the 1st bit, the 2nd bit, the 3rd bit, , the nth bit, respectively. All of the encoded wavelength components are again recombined into a single time-varying signal and transmitted through an optical fiber. Decoding at receiving site is made by the reverse operation. Encoding and decoding for 2-bit and 4-bit signals were demonstrated for 200 fs input light pulse with about 40 nm spectral width.

  • Distributed Load Balancing Schemes for Parallel Video Encoding System

    Zhaochen HUANG  Yoshinori TAKEUCHI  Hiroaki KUNIEDA  

     
    PAPER-Parallel/Multidimensional Signal Processing

      Vol:
    E77-A No:5
      Page(s):
    923-930

    We present distributed load balancing mechanisms implemented on multiprocessor systems for real time video encoding, which dynamically equalize load amounts among PE's to cope with extensive computing requirements. The loosely coupled multiprocessor system, e.g. a torus connected one, is treated as the objective system. Two decentralized controlled load balancicg algorithms are proposed, and mathematical analyses are provided to obtain some insights of our decentralized controlled mechanisms. We also prove the proposed algorithms are steady and effective theoretically and experimentally.

  • A Compiler Generation Method for HW/SW Codesign Based on Configurable Processors

    Shinsuke KOBAYASHI  Kentaro MITA  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER-Hardware/Software Codesign

      Vol:
    E85-A No:12
      Page(s):
    2586-2595

    This paper proposes a compiler generation method for PEAS-III (Practical Environment for ASIP development), which is a configurable processor development environment for application domain specific embedded systems. Using the PEAS-III system, not only the HDL description of a target processor but also its target compiler can be generated. Therefore, execution cycles and dynamic power consumption can be rapidly evaluated. Two processors and their derivatives were designed using the PEAS-III system in the experiment. Experimental results show that the trade-offs among area, performance and power consumption of processors were analyzed in about twelve hours and the optimal processor was selected under the design constraints by using generated compilers and processors.

  • A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

    Katsuya SHINOHARA  Norimasa OHTSUKI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E82-A No:11
      Page(s):
    2356-2365

    This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.

  • RHINE: Reconfigurable Multiprocessor System for Video CODEC

    Yoshinori TAKEUCHI  Zhao-Chen HUANG  Masatomo SAEKI  Hiroaki KUNIEDA  

     
    PAPER-Methods and Circuits for Signal Processing

      Vol:
    E76-A No:6
      Page(s):
    947-956

    This paper introduces the new application specific architecture RHINE (Reconfigurable Hierarchical Image Neo-multiprocessor Engine) that is a multiprocessor system for moving picture CODEC. The array processor is known to be originally suited for data parallel processing such as image signal processing which requires vast amount of computations and has the identical instruction sequences on data. However, the moving picture CODEC algorithm suffers from the large load imbalance in the processings on multi-processors with the separated sub-images. Some load balancing techniques are indispensable in such applications for the highest speed-up. RHINE gives one of the optimal solutions for such a load balancing due to its feature of the self reconfigurable architecture. RHINE consists of Block Processing Units (BPU) hierarchically, in each of which has a common bus architecture of multiprocessors with a block memory. Processors in a BPU move to the other BPU according to the load imbalance between BPUs by switching the bus connection between BPUs. The advantage of RHINE architecture is demonstrated by showing performance simulations for real moving pictures.

  • Incorporating Top-Down Guidance for Extracting Informative Patches for Image Classification

    Shuang BAI  Tetsuya MATSUMOTO  Yoshinori TAKEUCHI  Hiroaki KUDO  Noboru OHNISHI  

     
    LETTER-Pattern Recognition

      Vol:
    E95-D No:3
      Page(s):
    880-883

    In this letter, we introduce a novel patch sampling strategy for the task of image classification, which is fundamentally different from current patch sampling strategies. A top-down guidance learned from training images is used to guide patch sampling towards informative regions. Experiment results show that this approach achieved noticeable improvement over baseline patch sampling strategies for the classification of both object categories and scene categories.

  • Deformable Part Model Based Arrhythmia Detection Using Time Domain Features

    Yuuka HIRAO  Yoshinori TAKEUCHI  Masaharu IMAI  Jaehoon YU  

     
    PAPER-Digital Signal Processing

      Vol:
    E100-A No:11
      Page(s):
    2221-2229

    Heart disease is one of the major causes of death in many advanced countries. For prevention or treatment of heart disease, getting an early diagnosis from a long time period of electrocardiogram (ECG) examination is necessary. However, it could be a large burden on medical experts to analyze this large amount of data. To reduce the burden and support the analysis, this paper proposes an arrhythmia detection method based on a deformable part model, which absorbs individual variation of ECG waveform and enables the detection of various arrhythmias. Moreover, to detect the arrhythmia in low processing delay, the proposed method only utilizes time domain features. In an experimental result, the proposed method achieved 0.91 F-measure for arrhythmia detection.

  • Multi-Objective Genetic Programming with Redundancy-Regulations for Automatic Construction of Image Feature Extractors

    Ukrit WATCHAREERUETAI  Tetsuya MATSUMOTO  Yoshinori TAKEUCHI  Hiroaki KUDO  Noboru OHNISHI  

     
    PAPER-Biocybernetics, Neurocomputing

      Vol:
    E93-D No:9
      Page(s):
    2614-2625

    We propose a new multi-objective genetic programming (MOGP) for automatic construction of image feature extraction programs (FEPs). The proposed method was originated from a well known multi-objective evolutionary algorithm (MOEA), i.e., NSGA-II. The key differences are that redundancy-regulation mechanisms are applied in three main processes of the MOGP, i.e., population truncation, sampling, and offspring generation, to improve population diversity as well as convergence rate. Experimental results indicate that the proposed MOGP-based FEP construction system outperforms the two conventional MOEAs (i.e., NSGA-II and SPEA2) for a test problem. Moreover, we compared the programs constructed by the proposed MOGP with four human-designed object recognition programs. The results show that the constructed programs are better than two human-designed methods and are comparable with the other two human-designed methods for the test problem.

  • VLSI Architecture for Real-Time Fractal Image Coding Processors

    Hideki YAMAUCHI  Yoshinori TAKEUCHI  Masaharu IMAI  

     
    PAPER

      Vol:
    E83-A No:3
      Page(s):
    452-458

    This paper proposes an efficient architecture for fractal image coding processors. The proposed architecture achieves high-speed image coding comparable to conventional JPEG processing. This architecture achieves less than 33.3 msec fractal image compression coding against a 512 512 pixel image and enables full-motion fractal image coding. The circuit size of the proposed architecture design is comparable to those of JPEG processors and much smaller than those of previously proposed fractal processors.

1-20hit(32hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.