Jun-ichi ITO Takumi NAKANO Yoshinori TAKEUCHI Masaharu IMAI
This paper proposes a method to reduce the context switching time using a register bank to store contexts of working tasks. Hardware cost and performance were measured by modeling the register bank and controller in VHDL. Following results were obtained: (1) The controller can be implemented with a much smaller amount of hardware cost compared to that of the register bank, which is realized by SRAM module. (2) Context switching time can be reduced to less than 50% compared to that by software implementation. (3) Combination of the proposed architecture with our previous work (RTOS implemented in HW) gives us much higher performance of a hard real-time system.
Ukrit WATCHAREERUETAI Tetsuya MATSUMOTO Noboru OHNISHI Hiroaki KUDO Yoshinori TAKEUCHI
We propose a learning strategy for acceleration in learning speed of genetic programming (GP), named hierarchical structure GP (HSGP). The HSGP exploits multiple learning nodes (LNs) which are connected in a hierarchical structure, e.g., a binary tree. Each LN runs conventional evolutionary process to evolve its own population, and sends the evolved population into the connected higher-level LN. The lower-level LN evolves the population with a smaller subset of training data. The higher-level LN then integrates the evolved population from the connected lower-level LNs together, and evolves the integrated population further by using a larger subset of training data. In HSGP, evolutionary processes are sequentially executed from the bottom-level LNs to the top-level LN which evolves with the entire training data. In the experiments, we adopt conventional GPs and the HSGPs to evolve image recognition programs for given training images. The results show that the use of hierarchical structure learning can significantly improve learning speed of GPs. To achieve the same performance, the HSGPs need only 30-40% of the computation cost needed by conventional GPs.
Takuji HIEDA Hiroaki TANAKA Keishi SAKANUSHI Yoshinori TAKEUCHI Masaharu IMAI
Partial forwarding is a design method to place forwarding paths on a part of processor pipeline. Hardware cost of processor can be reduced without performance loss by partial forwarding. However, compiler with the instruction scheduler which considers partial forwarding structure of the target processor is required since conventional scheduling algorithm cannot make the most of partial forwarding structure. In this paper, we propose a heuristic instruction scheduling method for processors with partial forwarding structure. The proposed algorithm uses available distance to schedule instructions which are suitable for the target partial forwarding processor. Experimental results show that the proposed method generates near-optimal solutions in practical time and some of the optimized codes for partial forwarding processor run in the shortest time among the target processors. It also shows that the proposed method is superior to hazard detection unit.
Morgan Hirosuke MIKI Mamoru SAKAMOTO Shingo MIYAMOTO Yoshinori TAKEUCHI Toyohiko YOSHIDA Isao SHIRAKAWA
This paper evaluates the code efficiency of the ARM, Java, and x86 instruction sets by compiling the SPEC CPU95/CPU2000/JVM98 and CaffeineMark benchmarks, from the aspects of code sizes, basic block sizes, instruction distributions, and average instruction lengths. As a result, mainly because (i) the Java architecture is a stack machine, (ii) there are only four local variables which can be accessed by a 1-byte instruction, and (iii) additional instructions are provided for the network security, the code efficiency of Java turns out to be inferior to that of ARM Thumb. Moreover, through this efficiency analysis it should be stressed that there exists the high potential of constructing a more efficient code architecture by taking minute account of the customization of an instruction set as well as the number of registers.
Yoshinori TAKEUCHI Hiroaki KUNIEDA
This paper studies the method of parallel processing for two dimensional recursive filters on a multiprocessor system. Conventional recursive filterings are sequential and iterative local, i.e. global processing. We decompose their global processings into space partition processings with a few global communications. We derive an efficient parallel algorithm for two dimensional recursive filterings using Roesser's model and investigate their speed-up, rate, efficiency and degradation.
Nguyen Ngoc BINH Masaharu IMAI Yoshinori TAKEUCHI
In designing ASIPs (Application Specific Integrated Processors), the papers investigated so far have almost focused on the optimization of the CPU core and did not pay enough attention to the optimization of the RAM and ROM sizes together. This paper overcomes this limitation and proposes an optimization algorithm to define the best ratio between the CPU core, RAM and ROM of an ASIP chip to achieve the highest performance while satisfying design constraints on the chip area. The partitioning problem is formalized as a combinatorial optimization problem that partitions the operations into hardware and software so that the performance of the designed ASIP is maximized under given chip area constraint, where the chip area includes the HW cost of the register file for a given application program with associated input data set. The optimization problem is parameterized so that it can be applied with different technologies to synthesize CPU cores, RAMs or ROMs. The experimental results show that the proposed algorithm is found to be effective and efficient.
Tsuyoshi ISSHIKI Yoshinori TAKEUCHI Hiroaki KUNIEDA
In this paper, a methodology for designing the architecture of the processor array for wide class of image processing algorithms is proposed. A concept of spatially expanding the SFG description which enables us to handle the problem as merely one-dimensional signal processing is used in constructing the methodology. Problem of I/O interface which is critical in real-time processing is also considered.
Makiko ITOH Yoshinori TAKEUCHI Masaharu IMAI Akichika SHIOMI
A synthesizable HDL generation method for pipelined processors is proposed. By using the proposed method, data-path and control logic descriptions of a target processor is generated from a clock based instruction set specification. From the experimental results, feasibility of the proposed method is evaluated and the amount of processor design time was drastically reduced than that of conventional RT level manual design in HDL.
Andre CAVALCANTE Allan Kardec BARROS Yoshinori TAKEUCHI Noboru OHNISHI
In this letter, a new approach to segment depth-of-field (DoF) images is proposed. The methodology is based on a two-stage model of visual neuron. The first stage is a retinal filtering by means of luminance normalizing non-linearity. The second stage is a V1-like filtering using filters estimated by independent component analysis (ICA). Segmented image is generated by the response activity of the neuron measured in terms of kurtosis. Results demonstrate that the model can discriminate image parts in different levels of depth-of-field. Comparison with other methodologies and limitations of the proposed methodology are also presented.
Asako BABA Hitomi MORIYA Shin-ichi WAKABAYASHI Yukio TOYODA Yoshinori TAKEUCHI
We have developed spectral separation devices for processing femtosecond pulses. These devices are based on an optical coupler structure with fiber gratings. In a computer simulation, we confirmed that these devices could extract <1 nm bandwidth light with 80% efficiency. We fabricated the spectral separation devices using single mode fibers and highly Ge-doped fibers. These devices successfully extracted narrow spectral light of 0.3 nm bandwidth with 37% efficiency from femtosecond pulses of 40 nm bandwidth. We also fabricated 2-channel spectral separation devices, which could extract the light from each grating channel.
Shin-ichi WAKABAYASHI Hitomi MORIYA Asako BABA Yoshinori TAKEUCHI
We have developed optical encoding devices for processing femtosecond pulses. These devices are based on spectral separation devices and light modulators with fiber gratings. Experiments were made to encode a light pulse in the spectral domain. These experiments utilize the characteristics that a femtosecond light pulse has a very broad spectrum. An input femtosecond light pulse is decomposed into a series of wavelength components. Each wavelength component with narrow spectra <1 nm width is successfully extracted into a single mode fiber. Light modulators corresponding to wavelength components are assigned to the 1st bit, the 2nd bit, the 3rd bit,
Zhaochen HUANG Yoshinori TAKEUCHI Hiroaki KUNIEDA
We present distributed load balancing mechanisms implemented on multiprocessor systems for real time video encoding, which dynamically equalize load amounts among PE's to cope with extensive computing requirements. The loosely coupled multiprocessor system, e.g. a torus connected one, is treated as the objective system. Two decentralized controlled load balancicg algorithms are proposed, and mathematical analyses are provided to obtain some insights of our decentralized controlled mechanisms. We also prove the proposed algorithms are steady and effective theoretically and experimentally.
Shinsuke KOBAYASHI Kentaro MITA Yoshinori TAKEUCHI Masaharu IMAI
This paper proposes a compiler generation method for PEAS-III (Practical Environment for ASIP development), which is a configurable processor development environment for application domain specific embedded systems. Using the PEAS-III system, not only the HDL description of a target processor but also its target compiler can be generated. Therefore, execution cycles and dynamic power consumption can be rapidly evaluated. Two processors and their derivatives were designed using the PEAS-III system in the experiment. Experimental results show that the trade-offs among area, performance and power consumption of processors were analyzed in about twelve hours and the optimal processor was selected under the design constraints by using generated compilers and processors.
Katsuya SHINOHARA Norimasa OHTSUKI Yoshinori TAKEUCHI Masaharu IMAI
This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.
Yoshinori TAKEUCHI Zhao-Chen HUANG Masatomo SAEKI Hiroaki KUNIEDA
This paper introduces the new application specific architecture RHINE (Reconfigurable Hierarchical Image Neo-multiprocessor Engine) that is a multiprocessor system for moving picture CODEC. The array processor is known to be originally suited for data parallel processing such as image signal processing which requires vast amount of computations and has the identical instruction sequences on data. However, the moving picture CODEC algorithm suffers from the large load imbalance in the processings on multi-processors with the separated sub-images. Some load balancing techniques are indispensable in such applications for the highest speed-up. RHINE gives one of the optimal solutions for such a load balancing due to its feature of the self reconfigurable architecture. RHINE consists of Block Processing Units (BPU) hierarchically, in each of which has a common bus architecture of multiprocessors with a block memory. Processors in a BPU move to the other BPU according to the load imbalance between BPUs by switching the bus connection between BPUs. The advantage of RHINE architecture is demonstrated by showing performance simulations for real moving pictures.
Shuang BAI Tetsuya MATSUMOTO Yoshinori TAKEUCHI Hiroaki KUDO Noboru OHNISHI
In this letter, we introduce a novel patch sampling strategy for the task of image classification, which is fundamentally different from current patch sampling strategies. A top-down guidance learned from training images is used to guide patch sampling towards informative regions. Experiment results show that this approach achieved noticeable improvement over baseline patch sampling strategies for the classification of both object categories and scene categories.
Yuuka HIRAO Yoshinori TAKEUCHI Masaharu IMAI Jaehoon YU
Heart disease is one of the major causes of death in many advanced countries. For prevention or treatment of heart disease, getting an early diagnosis from a long time period of electrocardiogram (ECG) examination is necessary. However, it could be a large burden on medical experts to analyze this large amount of data. To reduce the burden and support the analysis, this paper proposes an arrhythmia detection method based on a deformable part model, which absorbs individual variation of ECG waveform and enables the detection of various arrhythmias. Moreover, to detect the arrhythmia in low processing delay, the proposed method only utilizes time domain features. In an experimental result, the proposed method achieved 0.91 F-measure for arrhythmia detection.
Ukrit WATCHAREERUETAI Tetsuya MATSUMOTO Yoshinori TAKEUCHI Hiroaki KUDO Noboru OHNISHI
We propose a new multi-objective genetic programming (MOGP) for automatic construction of image feature extraction programs (FEPs). The proposed method was originated from a well known multi-objective evolutionary algorithm (MOEA), i.e., NSGA-II. The key differences are that redundancy-regulation mechanisms are applied in three main processes of the MOGP, i.e., population truncation, sampling, and offspring generation, to improve population diversity as well as convergence rate. Experimental results indicate that the proposed MOGP-based FEP construction system outperforms the two conventional MOEAs (i.e., NSGA-II and SPEA2) for a test problem. Moreover, we compared the programs constructed by the proposed MOGP with four human-designed object recognition programs. The results show that the constructed programs are better than two human-designed methods and are comparable with the other two human-designed methods for the test problem.
Hideki YAMAUCHI Yoshinori TAKEUCHI Masaharu IMAI
This paper proposes an efficient architecture for fractal image coding processors. The proposed architecture achieves high-speed image coding comparable to conventional JPEG processing. This architecture achieves less than 33.3 msec fractal image compression coding against a 512 512 pixel image and enables full-motion fractal image coding. The circuit size of the proposed architecture design is comparable to those of JPEG processors and much smaller than those of previously proposed fractal processors.