IEICE globals.ieice.org Site

Keyword Search Result

[Keyword] co-design(18hit)

1-18hit

A 0.13 mJ/Prediction CIFAR-100 Fully Synthesizable Raster-Scan-Based Wired-Logic Processor in 16-nm FPGA Open Access
Dongzhu LI Zhijie ZHAN Rei SUMIKAWA Mototsugu HAMADA Atsutake KOSUGE Tadahiro KURODA

PAPER

Pubricized:
2023/11/24
Vol:
E107-C No:6
Page(s):
155-162
A 0.13mJ/prediction with 68.6% accuracy wired-logic deep neural network (DNN) processor is developed in a single 16-nm field-programmable gate array (FPGA) chip. Compared with conventional von-Neumann architecture DNN processors, the energy efficiency is greatly improved by eliminating DRAM/BRAM access. A technical challenge for conventional wired-logic processors is the large amount of hardware resources required for implementing large-scale neural networks. To implement a large-scale convolutional neural network (CNN) into a single FPGA chip, two technologies are introduced: (1) a sparse neural network known as a non-linear neural network (NNN), and (2) a newly developed raster-scan wired-logic architecture. Furthermore, a novel high-level synthesis (HLS) technique for wired-logic processor is proposed. The proposed HLS technique enables the automatic generation of two key components: (1) Verilog-hardware description language (HDL) code for a raster-scan-based wired-logic processor and (2) test bench code for conducting equivalence checking. The automated process significantly mitigates the time and effort required for implementation and debugging. Compared with the state-of-the-art FPGA-based processor, 238 times better energy efficiency is achieved with only a slight decrease in accuracy on the CIFAR-100 task. In addition, 7 times better energy efficiency is achieved compared with the state-of-the-art network-optimized application-specific integrated circuit (ASIC).
MITA: Multi-Input Adaptive Activation Function for Accurate Binary Neural Network Hardware
Peiqi ZHANG Shinya TAKAMAEDA-YAMAZAKI

PAPER

Pubricized:
2023/05/24
Vol:
E106-D No:12
Page(s):
2006-2014
Binary Neural Networks (BNN) have binarized neuron and connection values so that their accelerators can be realized by extremely efficient hardware. However, there is a significant accuracy gap between BNNs and networks with wider bit-width. Conventional BNNs binarize feature maps by static globally-unified thresholds, which makes the produced bipolar image lose local details. This paper proposes a multi-input activation function to enable adaptive thresholding for binarizing feature maps: (a) At the algorithm level, instead of operating each input pixel independently, adaptive thresholding dynamically changes the threshold according to surrounding pixels of the target pixel. When optimizing weights, adaptive thresholding is equivalent to an accompanied depth-wise convolution between normal convolution and binarization. Accompanied weights in the depth-wise filters are ternarized and optimized end-to-end. (b) At the hardware level, adaptive thresholding is realized through a multi-input activation function, which is compatible with common accelerator architectures. Compact activation hardware with only one extra accumulator is devised. By equipping the proposed method on FPGA, 4.1% accuracy improvement is achieved on the original BNN with only 1.1% extra LUT resource. Compared with State-of-the-art methods, the proposed idea further increases network accuracy by 0.8% on the Cifar-10 dataset and 0.4% on the ImageNet dataset.
RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks
Cheng LUO Wei CAO Lingli WANG Philip H. W. LEONG

PAPER-Applications

Pubricized:
2019/02/19
Vol:
E102-D No:5
Page(s):
1037-1045
With the continuous refinement of Deep Neural Networks (DNNs), a series of deep and complex networks such as Residual Networks (ResNets) show impressive prediction accuracy in image classification tasks. Unfortunately, the structural complexity and computational cost of residual networks make hardware implementation difficult. In this paper, we present the quantized and reconstructed deep neural network (QR-DNN) technique, which first inserts batch normalization (BN) layers in the network during training, and later removes them to facilitate efficient hardware implementation. Moreover, an accurate and efficient residual network accelerator (RNA) is presented based on QR-DNN with batch-normalization-free structures and weights represented in a logarithmic number system. RNA employs a systolic array architecture to perform shift-and-accumulate operations instead of multiplication operations. QR-DNN is shown to achieve a 1∼2% improvement in accuracy over existing techniques, and RNA over previous best fixed-point accelerators. An FPGA implementation on a Xilinx Zynq XC7Z045 device achieves 804.03 GOPS, 104.15 FPS and 91.41% top-5 accuracy for the ResNet-50 benchmark, and state-of-the-art results are also reported for AlexNet and VGG.
Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution
Surachai THONGKAEW Tsuyoshi ISSHIKI Dongju LI Hiroaki KUNIEDA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E98-A No:12
Page(s):
2505-2518
The Process Virtual Machine (VM) is typical software that runs applications inside operating systems. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware, operating system and allows bytecodes (portable code) to be executed in the same way on any other platforms. The Process VMs are implemented using an interpreter to interpret bytecode instead of direct execution of host machine codes. Thus, the bytecode execution is slower than those of the compiled programming language execution. Several techniques including our previous paper, the “Fetch/Decode Hardware Extension”, have been proposed to speed up the interpretation of Process VMs. In this paper, we propose an additional methodology, the “Hardware Extension with Hybrid Execution” to further enhance the performance of Process VMs interpretation and focus on Register-based model. This new technique provides an additional decoder which can classify bytecodes into either simple or complex instructions. With “Hybrid Execution”, the simple instruction will be directly executed on hardware of native processor. The complex instruction will be emulated by the “extra optimized bytecode software handler” of native processor. In order to eliminate the overheads of retrieving and storing operand on memory, we utilize the physical registers instead of (low address) virtual registers. Moreover, the combination of 3 techniques: Delay scheduling, Mode predictor HW and Branch/goto controller can eliminate all of the switching mode overheads between native mode and bytecode mode. The experimental results show the improvements of execution speed on the Arithmetic instructions, loop & conditional instructions and method invocation & return instructions can be achieved up to 16.9x, 16.1x and 3.1x respectively. The approximate size of the proposed hardware extension is 0.04mm2 (or equivalent to 14.81k gates) and consumes an additional power of only 0.24mW. The stated results are obtained from logic synthesis using the TSMC 90nm technology @ 200MHz.
AC Power Supply Noise Simulation of CMOS Microprocessor with LSI Chip-Package-Board Integrated Model
Kumpei YOSHIKAWA Kouji ICHIKAWA Makoto NAGATA

PAPER

Vol:
E97-C No:4
Page(s):
264-271
An LSI Chip-Package-Board integrated power noise simulation model and its validity is discussed in this paper. A unified power delivery network (PDN) of LSI chip, package, and printed circuit board (PCB) is connected with on-chip power supply current models with capacitor charging expression. The proposed modeling flow is demonstrated for the 32-bit microprocessor in a 1.0V 90nm CMOS technology. The PDN of the system that includes a chip, bonding wires and a printed circuit board is modeled in an equivalent circuit. The on-chip power supply noise monitoring technique and the magnetic probe method is applied for validating simulation results. Simulations and measurements explore power supply noise generation with the dependency on operating frequencies in the wide range from 10MHz to 300MHz, under the operation mode of dynamic frequency scaling, and in the long time operation with various operation codes. It is confirmed that the proposed power supply noise simulation model is helpful for the noise estimation throughout the design phase of the LSI system.
Co-simulation of On-Chip and On-Board AC Power Noise of CMOS Digital Circuits
Kumpei YOSHIKAWA Yuta SASAKI Kouji ICHIKAWA Yoshiyuki SAITO Makoto NAGATA

PAPER-Device and Circuit Modeling and Analysis

Vol:
E95-A No:12
Page(s):
2284-2291
Capacitor charging modeling efficiently and accurately represents power consumption current of CMOS digital circuits and actualizes co-simulation of AC power noise including the interaction with on-chip and on-board integrated power delivery network (PDN). It is clearly demonstrated that the AC power noise is dominantly characterized by the frequency-dependent impedance of PDN and also by the operating frequency of circuits as well. A 65 nm CMOS chip exhibits the AC power noise components in substantial relation with the parallel resonance of the PDN seen from on-chip digital circuits. An on-chip noise monitor measures in-circuit power supply voltage, while a near-field magnetic probing derives on-board power supply current. The proposed co-simulation well matches the power noise measurements. The proposed AC noise co-simulation will be essentially applicable in the design of PDNs toward on-chip power supply integrity (PSI) and off-chip electromagnetic compatibility (EMC).
High-Level Synthesis of Software Function Calls
Masanari NISHIMURA Nagisa ISHIURA Yoshiyuki ISHIMORI Hiroyuki KANBARA Hiroyuki TOMIYAMA

LETTER-High-Level Synthesis and System-Level Design

Vol:
E91-A No:12
Page(s):
3556-3558
This letter presents a novel framework in high-level synthesis where hardware modules synthesized from functions in a given ANSI-C program can call the other software functions in the program. This enables high-level synthesis from C programs that contains calls to hard-to-synthesize functions, such as dynamic memory management, I/O request, or very large and complex functions. A single-thread implementation scheme is shown, whose correctness has been verified through register transfer level simulation.
CPU Model-Based Mechatronics/Hardware/Software Co-design Technology for Real-Time Embedded Control Systems
Makoto ISHIKAWA George SAIKALIS Shigeru OHO

PAPER-VLSI Design Technology

Vol:
E90-C No:10
Page(s):
1992-2001
We review practical case studies of a developing method of highly reliable real-time embedded control systems using a CPU model-based hardware/software co-simulation. We take an approach that enables us to fully simulate a virtual mechanical control system including a mechatronics plant, microcontroller hardware, and object code level software. This full virtual system approach simulates control system behavior, especially that of the microcontroller hardware and software. It enables design space exploration of microarchitecture, control design validation, robustness evaluation of the system, software optimization before components design. It also avoids potential problems. The advantage of this work is that it comprises all the components in a typical control system, enabling the designers to analyze effects from different domains, for example mechanical analysis of behavior due to differences in controller microarchitecture. To further improve system design, evaluation and analysis, we implemented an integrated behavior analyzer in the development environment. This analyzer can graphically display the processor behavior during the simulation without affecting simulation results such as task level CPU load, interrupt statistics, and the software variable transition chart. It also provides useful information on the system behavior. This virtual system analysis does not require software modification, does not change the control timing, and does not require any processing power from the target microcontroller. Therefore this method is suitable for real-time embedded control system design, in particular automotive control system design that requires a high level of reliability, robustness, quality, and safety. In this study, a Renesas SH-2A microcontroller model was developed on a CoMETTMplatform from VaST Systems Technology. An electronic throttle control (ETC) system and an engine control system were chosen to prove this concept. The electronic throttle body (ETB) model on the Saber® simulator from Synopsys® and the engine model on MATLAB®/Simulink® simulator from MathWorks can be simulated with the SH-2A model using a newly developed co-simulation interface between MATLAB®/Simulink® and CoMETTM. Though the SH-2A chip was being developed as the project was being executed, we were able to complete the OSEK OS development, control software design, and verification of the entire system using the virtual environment. After releasing a working sample chip in a later stage of the project, we found that such software could run on both actual ETC system and engine control system without critical problem. This demonstrates that our models and simulation environment are sufficiently credible and trustworthy.
Symbolic Simulation Heuristics for High-Level Hardware Descriptions Including Uninterpreted Functions
Kiyoharu HAMAGUCHI

LETTER

Vol:
E87-D No:3
Page(s):
637-641
This letter handles symbolic simulation for high-level hardware design descriptions including uninterpreted functions. Two new heuristics are introduced, which are named "symbolic function table" and "synchronization". In the experiment, the equivalence of a hardware/software codesign was checked up to a given finite number of cycles, which is composed of a behavioral design, that is, a small DSP program written in C, and its register-transfer-level implementation, a VLIW architecture with an assembly program. Our symbolic simulator succeeded in checking the equivalence of the two descriptions which were not tractable without the heuristics.
Performance Estimation at Architecture Level for Embedded Systems
Hiroshi MIZUNO Hiroyuki KOBAYASHI Takao ONOYE Isao SHIRAKAWA

PAPER-Performance Estimation

Vol:
E85-A No:12
Page(s):
2636-2644
This paper devises a sophisticated approach to the performance estimation of an embedded hardware-software codesign system at the architecture level, which intends to optimize the hardware-software configuration in terms of processing time, power dissipation, and hardware cost. A distinctive feature of this approach consists in constructing a performance estimation model proper to each component of an embedded system, such as CPU core, RAM/ROM, cache memory, and application-specific hardware, by taking account of not only the functional performance but also the data transfer. The proposed estimation schemes are incorporated into an existing instruction set simulator, so that the actual performance can be estimated accurately at the architecture level. The experimental results demonstrate that the performance estimation approach enables the precise design decision at the architecture level, which greatly contributes toward enhancing the design ability dedicatedly for mobile appliances.
Verifying Signal-Transition Consistency of High-Level Designs Based on Symbolic Simulation
Kiyoharu HAMAGUCHI Hidekazu URUSHIHARA Toshinobu KASHIWABARA

PAPER-Verification

Vol:
E85-D No:10
Page(s):
1587-1594
This paper deals with formal verification of high-level designs, in particular, symbolic comparison of register-transfer-level descriptions and behavioral descriptions. We use state machines extended by quantifier-free first-order logic with equality, as models of those descriptions. We cannot adopt the classical notion of equivalence for state machines, because the signals in the corresponding outputs of such two descriptions do not change in the same way. This paper defines a new notion of consistency based on signal-transitions of the corresponding outputs, and proposes an algorithm for checking consistency of those descriptions, up to a limited number of steps from initial states. As an example of high-level designs, we take a simple hardware/software codesign. A C program for digital signal processing called PARCOR filter was compared with its corresponding design given as a register-transfer-level description, which is composed of a VLIW architecture and assembly code. Since this example terminates within approximately 4500 steps, symbolic exploration of a finite number of steps is sufficient to verify the descriptions. Our prototype verifier succeeded in the verification of this example in 31 minutes.
Proposal of a Multi-Threaded Processor Architecture for Embedded Systems and Its Evaluation
Shinsuke KOBAYASHI Yoshinori TAKEUCHI Akira KITAJIMA Masaharu IMAI

PAPER

Vol:
E84-A No:3
Page(s):
748-754
In this paper, an architecture of multi-threaded processor for embedded systems is proposed and evaluated comparing with other processors for embedded systems. The experimental results show the trade-off of hardware costs and execution times among processors. Taking proposed multi-threaded processor into account as an embedded processor, design space of embedded systems are enlarged and more suitable architecture can be selected under some design constraints.
Three-Layer Cooperative Architecture for MPEG-2 Video Encoder LSI
Mitsuo IKEDA Toshio KONDO Koyo NITTA Kazuhito SUGURI Takeshi YOSHITOME Toshihiro MINAMI Jiro NAGANUMA Takeshi OGURA

PAPER

Vol:
E83-C No:2
Page(s):
170-178
This paper presents an architecture for a single-chip MPEG-2 video encoder and demonstrates its flexibility and usefulness. The architecture based on three-layer cooperation provides flexible data-transfer that improves the encoder from the standpoints of versatility, scalability, and video quality. The LSI was successfully fabricated in the 0.25-µm four-metal CMOS process. Its small size and its low power consumption make it ideal for a wide range of applications, such as DVD recorders, PC-card encoders and HDTV encoders.
A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency
Katsuya SHINOHARA Norimasa OHTSUKI Yoshinori TAKEUCHI Masaharu IMAI

PAPER

Vol:
E82-A No:11
Page(s):
2356-2365
This paper proposes an ASIP performance optimization method taking clock frequency into account. The performance of an instruction set processor can be measured using the execution time of an application program, which can be determined by the clock cycles to perform the application program divided by the applied clock frequency. Therefore, the clock frequency should also be tuned in order to maximize the performance of the processor under the given design constraints. Experimental results show that the proposed method determines an optimal combination of FUs considering clock frequency.
A Method for Design of Embedded Systems for Multimedia Applications
Katsuhiko SEO Hisao KOIZUMI Barry SHACKLEFORD Masashi MORI Takashi KUSUHARA Hirotaka KIMURA Fumio SUZUKI

PAPER

Vol:
E81-C No:5
Page(s):
725-732
This paper proposes a top-down co-verification approach in the design of embedded systems composed of both hardware and software, for multimedia applications. In order to realize the optimized embedded system in cost, performance, power consumption and flexibility, hardware/software co-design becomes to be essential. In this top-down co-design flow, a target design is verified at three different levels: (1) algorithmic, (2) implementation, and (3) experimental. We have developed a methodology of top-down co-verification, which consists of the system level simulation at the algorithmic level, two type of co-simulations at the implementation level and the co-emulation at the experimental level. We have realized an environment optimized for verification performance by employing verification models appropriate to each verification stage and an efficient top-down environment by introducing the component logical bus architecture as the interface between hardware and software. Through actual application to a image compression and expansion system, the possibility of efficient co-verification was demonstrated.
Polling-Based Real-Time Software for MPEG2 System Protocol LSIs
Jiro NAGANUMA Makoto ENDO

PAPER

Vol:
E81-C No:5
Page(s):
695-701
This paper proposes polling-based real-time software for MPEG2 System protocol LSIs, which is a typical embedded and real-time system on a chip, and demonstrates its performance and usefulness. The polling-based real-time software is designed and optimized by analyzing application specific function requirements and deciding scheduling intervals and the execution cycles of each task. It requires neither hardware for multiple interrupt handling nor software for heavy context switching. The polling-based approach provides sufficient performance without any hardware and software overhead for a real-time application like the MPEG2 System protocol.
Top-Down Co-simulation of Hardware/Software Co-designs for Embedded Systems Based Upon a Component Logical Bus Architecture
Katsuhiko SEO Hisao KOIZUMI Barry SHACKLEFORD Mitsuhiro YASUDA Masashi MORI Fumio SUZUKI

PAPER

Vol:
E80-A No:10
Page(s):
1834-1841
We propose a top-down approach for cosimulation of hardware/software co-designs for embedded systems and introduce a component logical bus architecture as an interface between software components implemented by processors and hardware components implemented by custom logic circuits. Co-simulation using a component logical bus architecture is possible is the same environment from the stage at which the processor is not yet finalized to the stage at which the processor is modeled in register transfer language. Models based upon a component logical bus architecture can be circulated and reused. We further describe experimental results of our approach.
A Proposal for a Co-design Method in Control Systems Using Combination of Models
Hisao KOIZUMI Katsuhiko SEO Fumio SUZUKI Yoshisuke OHTSURU Hiroto YASUURA

PAPER-System Design

Vol:
E78-D No:3
Page(s):
237-247
In this paper we propose a co-design method for control systems using combination of models. By co-design," we mean a cooperative design method in which the behavior of the entire system is simulated as a single model while parameters of the system are being optimized. Our co-design method enables the various subsystems in the system, which have been designed independently as tasks assigned to different designers in the traditional design method, to be designed simultaneously in a unified cooperative way from the system-wide perspective of a system designer. Our proposed method combines models of controlling and controlled subsystems into a single model for the behavior of the entire control system. After the optimum control conditions are determined through simulation of the combined models, based on the corresponding algorithms and parameters, ASIC design proceeds quickly with accurate verification using iterative replacements of the behavior model by the electronic circuit model. To evaluate the proposed method, we implemented a design environment. We then applied our method to the design of ASICs in three test cases (in a control system and in audio-visual systems) to investigate its effectiveness. This paper introduces the concepts of the proposed co-design method, the design environment and the experimental results, and points out the new issues for system design.

Keyword Search Result

[Keyword] co-design(18hit)

A 0.13 mJ/Prediction CIFAR-100 Fully Synthesizable Raster-Scan-Based Wired-Logic Processor in 16-nm FPGA Open Access

MITA: Multi-Input Adaptive Activation Function for Accurate Binary Neural Network Hardware

RNA: An Accurate Residual Network Accelerator for Quantized and Reconstructed Deep Neural Networks

Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution

AC Power Supply Noise Simulation of CMOS Microprocessor with LSI Chip-Package-Board Integrated Model

Co-simulation of On-Chip and On-Board AC Power Noise of CMOS Digital Circuits

High-Level Synthesis of Software Function Calls

CPU Model-Based Mechatronics/Hardware/Software Co-design Technology for Real-Time Embedded Control Systems

Symbolic Simulation Heuristics for High-Level Hardware Descriptions Including Uninterpreted Functions

Performance Estimation at Architecture Level for Embedded Systems

Verifying Signal-Transition Consistency of High-Level Designs Based on Symbolic Simulation

Proposal of a Multi-Threaded Processor Architecture for Embedded Systems and Its Evaluation

Three-Layer Cooperative Architecture for MPEG-2 Video Encoder LSI

A Performance Optimization Method for Pipelined ASIPs in Consideration of Clock Frequency

A Method for Design of Embedded Systems for Multimedia Applications

Polling-Based Real-Time Software for MPEG2 System Protocol LSIs

Top-Down Co-simulation of Hardware/Software Co-designs for Embedded Systems Based Upon a Component Logical Bus Architecture

A Proposal for a Co-design Method in Control Systems Using Combination of Models

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles