Keyword Search Result

[Keyword] embedded system(74hit)

1-20hit(74hit)

  • A Two-Phase Algorithm for Reliable and Energy-Efficient Heterogeneous Embedded Systems Open Access

    Hongzhi XU  Binlian ZHANG  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2024/05/27
      Vol:
    E107-D No:10
      Page(s):
    1285-1296

    Reliability is an important figure of merit of the system and it must be satisfied in safety-critical applications. This paper considers parallel applications on heterogeneous embedded systems and proposes a two-phase algorithm framework to minimize energy consumption for satisfying applications’ reliability requirement. The first phase is for initial assignment and the second phase is for either satisfying the reliability requirement or improving energy efficiency. Specifically, when the application’s reliability requirement cannot be achieved via the initial assignment, an algorithm for enhancing the reliability of tasks is designed to satisfy the application’s reliability requirement. Considering that the reliability of initial assignment may exceed the application’s reliability requirement, an algorithm for reducing the execution frequency of tasks is designed to improve energy efficiency. The proposed algorithms are compared with existing algorithms by using real parallel applications. Experimental results demonstrate that the proposed algorithms consume less energy while satisfying the application’s reliability requirements.

  • Low-Complexity and Accurate Noise Suppression Based on an a Priori SNR Model for Robust Speech Recognition on Embedded Systems and Its Evaluation in a Car Environment

    Masanori TSUJIKAWA  Yoshinobu KAJIKAWA  

     
    PAPER-Digital Signal Processing

      Pubricized:
    2023/02/28
      Vol:
    E106-A No:9
      Page(s):
    1224-1233

    In this paper, we propose a low-complexity and accurate noise suppression based on an a priori SNR (Speech to Noise Ratio) model for greater robustness w.r.t. short-term noise-fluctuation. The a priori SNR, the ratio of speech spectra and noise spectra in the spectral domain, represents the difference between speech features and noise features in the feature domain, including the mel-cepstral domain and the logarithmic power spectral domain. This is because logarithmic operations are used for domain conversions. Therefore, an a priori SNR model can easily be expressed in terms of the difference between the speech model and the noise model, which are modeled by the Gaussian mixture models, and it can be generated with low computational cost. By using a priori SNRs accurately estimated on the basis of an a priori SNR model, it is possible to calculate accurate coefficients of noise suppression filters taking into account the variance of noise, without serious increase in computational cost over that of a conventional model-based Wiener filter (MBW). We have conducted in-car speech recognition evaluation using the CENSREC-2 database, and a comparison of the proposed method with a conventional MBW showed that the recognition error rate for all noise environments was reduced by 9%, and that, notably, that for audio-noise environments was reduced by 11%. We show that the proposed method can be processed with low levels of computational and memory resources through implementation on a digital signal processor.

  • Temporal Ensemble SSDLite: Exploiting Temporal Correlation in Video for Accurate Object Detection

    Lukas NAKAMURA  Hiromitsu AWANO  

     
    PAPER-Vision

      Pubricized:
    2022/01/18
      Vol:
    E105-A No:7
      Page(s):
    1082-1090

    We propose “Temporal Ensemble SSDLite,” a new method for video object detection that boosts accuracy while maintaining detection speed and energy consumption. Object detection for video is becoming increasingly important as a core part of applications in robotics, autonomous driving and many other promising fields. Many of these applications require high accuracy and speed to be viable, but are used in compute and energy restricted environments. Therefore, new methods that increase the overall performance of video object detection i.e., accuracy and speed have to be developed. To increase accuracy we use ensemble, the machine learning method of combining predictions of multiple different models. The drawback of ensemble is the increased computational cost which is proportional to the number of models used. We overcome this deficit by deploying our ensemble temporally, meaning we inference with only a single model at each frame, cycling through our ensemble of models at each frame. Then, we combine the predictions for the last N frames where N is the number of models in our ensemble through non-max-suppression. This is possible because close frames in a video are extremely similar due to temporal correlation. As a result, we increase accuracy through the ensemble while only inferencing a single model at each frame and therefore keeping the detection speed. To evaluate the proposal, we measure the accuracy, detection speed and energy consumption on the Google Edge TPU, a machine learning inference accelerator, with the Imagenet VID dataset. Our results demonstrate an accuracy boost of up to 4.9% while maintaining real-time detection speed and an energy consumption of 181mJ per image.

  • Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs

    Akira JINGUJI  Shimpei SATO  Hiroki NAKAHARA  

     
    PAPER

      Pubricized:
    2021/09/27
      Vol:
    E104-D No:12
      Page(s):
    2040-2047

    Convolutional neural network (CNN) has a high recognition rate in image recognition and are used in embedded systems such as smartphones, robots and self-driving cars. Low-end FPGAs are candidates for embedded image recognition platforms because they achieve real-time performance at a low cost. However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity. To solve these problems, we exploit a split-CNN and weight sparseness. The split-CNN reduces the memory footprint by splitting the feature map into smaller patches and allows the feature map to be stored in the FPGA's high-throughput on-chip memory. Weight sparseness reduces computational costs and achieves even higher performance. We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA. An experiment on classification using VGG16 shows that our implementation is 3.1 times faster than the GPU, and 5.4 times faster than an existing FPGA implementation.

  • Non-Volatile Main Memory Emulator for Embedded Systems Employing Three NVMM Behaviour Models

    Yu OMORI  Keiji KIMURA  

     
    PAPER-Computer System

      Pubricized:
    2021/02/05
      Vol:
    E104-D No:5
      Page(s):
    697-708

    Emerging byte-addressable non-volatile memory devices attract much attention. A non-volatile main memory (NVMM) built on them enables larger memory size and lower power consumption than a traditional DRAM main memory. To fully utilize an NVMM, both software and hardware must be cooperatively optimized. Simultaneously, even focusing on a memory module, its micro architecture is still being developed though real non-volatile memory modules, such as Intel Optane DC persistent memory (DCPMM), have been on the market. Looking at existing NVMM evaluation environments, software simulators can evaluate various micro architectures with their long simulation time. Emulators can evaluate the whole system fast with less flexibility in their configuration than simulators. Thus, an NVMM emulator that can realize flexible and fast system evaluation still has an important role to explore the optimal system. In this paper, we introduce an NVMM emulator for embedded systems and explore a direction of optimization techniques for NVMMs by using it. It is implemented on an SoC-FPGA board employing three NVMM behaviour models: coarse-grain, fine-grain and DCPMM-based. The coarse and fine models enable NVMM performance evaluations based on extensions of traditional DRAM behaviour. The DCPMM-based model emulates the behaviour of a real DCPMM. Whole evaluation environment is also provided including Linux kernel modifications and several runtime functions. We first validate the developed emulator with an existing NVMM emulator, a cycle-accurate NVMM simulator and a real DCPMM. Then, the program behavior differences among three models are evaluated with SPEC CPU programs. As a result, the fine-grain model reveals the program execution time is affected by the frequency of NVMM memory requests rather than the cache hit ratio. Comparing with the fine-grain model and the coarse-grain model under the condition of the former's longer total write latency than the latter's, the former shows lower execution time for four of fourteen programs than the latter because of the bank-level parallelism and the row-buffer access locality exploited by the former model.

  • Design and Implementation of LoRa-Based Wireless Sensor Network with Embedded System for Smart Agricultural Recycling Rapid Processing Factory

    Chia-Yu WANG  Chia-Hsin TSAI  Sheng-Chung WANG  Chih-Yu WEN  Robert Chen-Hao CHANG  Chih-Peng FAN  

     
    INVITED PAPER

      Pubricized:
    2021/02/25
      Vol:
    E104-D No:5
      Page(s):
    563-574

    In this paper, the effective Long Range (LoRa) based wireless sensor network is designed and implemented to provide the remote data sensing functions for the planned smart agricultural recycling rapid processing factory. The proposed wireless sensor network transmits the sensing data from various sensors, which measure the values of moisture, viscosity, pH, and electrical conductivity of agricultural organic wastes for the production and circulation of organic fertilizers. In the proposed wireless sensor network design, the LoRa transceiver module is used to provide data transmission functions at the sensor node, and the embedded platform by Raspberry Pi module is applied to support the gateway function. To design the cloud data server, the MySQL methodology is applied for the database management system with Apache software. The proposed wireless sensor network for data communication between the sensor node and the gateway supports a simple one-way data transmission scheme and three half-duplex two-way data communication schemes. By experiments, for the one-way data transmission scheme under the condition of sending one packet data every five seconds, the packet data loss rate approaches 0% when 1000 packet data is transmitted. For the proposed two-way data communication schemes, under the condition of sending one packet data every thirty seconds, the average packet data loss rates without and with the data-received confirmation at the gateway side can be 3.7% and 0%, respectively.

  • Model Checking of Real-Time Properties for Embedded Assembly Program Using Real-Time Temporal Logic RTCTL and Its Application to Real Microcontroller Software

    Yajun WU  Satoshi YAMANE  

     
    PAPER-Software System

      Pubricized:
    2020/01/06
      Vol:
    E103-D No:4
      Page(s):
    800-812

    For embedded systems, verifying both real-time properties and logical validity are important. The embedded system is not only required to the accurate operation but also required to strictly real-time properties. To verify real-time properties is a key problem in model checking. In order to verify real-time properties of assembly program, we develop the simulator to propose the model checking method for verifying assembly programs. Simultaneously, we propose a timed Kripke structure and implement the simulator of the robot's processor to be verified. We propose the timed Kripke structure including the execution time which extends Kripke structure. For the input assembly program, the simulator generates timed Kripke structure by dynamic program analysis. Also, we implement model checker after generating timed Kripke structure in order to verify whether timed Kripke structure satisfies RTCTL formulas. Finally, to evaluate a proposed method, we conduct experiments with the implementation of the verification system. To solve the real problem, we have experimented with real microcontroller software.

  • Essential Roles, Challenges and Development of Embedded MCU Micro-Systems to Innovate Edge Computing for the IoT/AI Age Open Access

    Takashi KONO  Yasuhiko TAITO  Hideto HIDAKA  

     
    INVITED PAPER-Integrated Electronics

      Vol:
    E103-C No:4
      Page(s):
    132-143

    Embedded system approaches to edge computing in IoT implementations are proposed and discussed. Rationales of edge computing and essential core capabilities for IoT data supply innovation are identified. Then, innovative roles and development of MCU and embedded flash memory are illustrated by technology and applications, expanding from CPS to big-data and nomadic/autonomous elements of IoT requirements. Conclusively, a technology roadmap construction specific to IoT is proposed.

  • Fast Inference of Binarized Convolutional Neural Networks Exploiting Max Pooling with Modified Block Structure

    Ji-Hoon SHIN  Tae-Hwan KIM  

     
    LETTER-Software System

      Pubricized:
    2019/12/03
      Vol:
    E103-D No:3
      Page(s):
    706-710

    This letter presents a novel technique to achieve a fast inference of the binarized convolutional neural networks (BCNN). The proposed technique modifies the structure of the constituent blocks of the BCNN model so that the input elements for the max-pooling operation are binary. In this structure, if any of the input elements is +1, the result of the pooling can be produced immediately; the proposed technique eliminates such computations that are involved to obtain the remaining input elements, so as to reduce the inference time effectively. The proposed technique reduces the inference time by up to 34.11%, while maintaining the classification accuracy.

  • Virtual Address Remapping with Configurable Tiles in Image Processing Applications

    Jae Young HUR  

     
    PAPER-Computer System

      Pubricized:
    2019/10/17
      Vol:
    E103-D No:2
      Page(s):
    309-320

    The conventional linear or tiled address maps can degrade performance and memory utilization when traffic patterns are not matched with an underlying address map. The address map is usually fixed at design time. Accordingly, it is difficult to adapt to given applications. Modern embedded system usually accommodates memory management units (MMUs). As a result, depending on virtual address patterns, the system can suffer from performance overheads due to page table walks. To alleviate this performance overhead, we propose to cluster and rearrange tiles to construct an MMU-aware configurable address map. To construct the clustered tiled map, the generic tile number remapping algorithm is presented. In the presented scheme, an address map is configured based on the adaptive dimensioning algorithm. Considering image processing applications, a design, an analysis, an implementation, and simulations are conducted. The results indicate the proposed method can improve the performance and the memory utilization with moderate hardware costs.

  • Hardware Architecture for High-Speed Object Detection Using Decision Tree Ensemble

    Koichi MITSUNARI  Jaehoon YU  Takao ONOYE  Masanori HASHIMOTO  

     
    PAPER

      Vol:
    E101-A No:9
      Page(s):
    1298-1307

    Visual object detection on embedded systems involves a multi-objective optimization problem in the presence of trade-offs between power consumption, processing performance, and detection accuracy. For a new Pareto solution with high processing performance and low power consumption, this paper proposes a hardware architecture for decision tree ensemble using multiple channels of features. For efficient detection, the proposed architecture utilizes the dimensionality of feature channels in addition to parallelism in image space and adopts task scheduling to attain random memory access without conflict. Evaluation results show that an FPGA implementation of the proposed architecture with an aggregated channel features pedestrian detector can process 229 million samples per second at 100MHz operation frequency while it requires a relatively small amount of resources. Consequently, the proposed architecture achieves 350fps processing performance for 1080P Full HD images and outperforms conventional object detection hardware architectures developed for embedded systems.

  • Evaluation of Register Number Abstraction for Enhanced Instruction Register Files

    Naoki FUJIEDA  Kiyohiro SATO  Ryodai IWAMOTO  Shuichi ICHIKAWA  

     
    PAPER-Computer System

      Pubricized:
    2018/03/14
      Vol:
    E101-D No:6
      Page(s):
    1521-1531

    Instruction set randomization (ISR) is a cost-effective obfuscation technique that modifies or enhances the relationship between instructions and machine languages. An Instruction Register File (IRF), a list of frequently used instructions, can be used for ISR by providing the way of indirect access to them. This study examines the IRF that integrates a positional register, which was proposed as a supplementary unit of the IRF, for the sake of tamper resistance. According to our evaluation, with a new design for the contents of the positional register, the measure of tamper resistance was increased by 8.2% at a maximum, which corresponds to a 32.2% increase in the size of the IRF. The number of logic elements increased by the addition of the positional register was 3.5% of its baseline processor.

  • Static Mapping of Parallelizable Tasks under Deadline Constraints

    Yining XU  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    LETTER

      Vol:
    E100-A No:7
      Page(s):
    1500-1502

    Task mapping is one of the most important design processes in embedded manycore systems. This paper proposes a static task mapping technique for manycore real-time systems. The technique minimizes the number of cores while satisfying deadline constraints of individual tasks.

  • BFWindow: Speculatively Checking Data Property Consistency against Buffer Overflow Attacks

    Jinli RAO  Zhangqing HE  Shu XU  Kui DAI  Xuecheng ZOU  

     
    PAPER

      Pubricized:
    2016/05/31
      Vol:
    E99-D No:8
      Page(s):
    2002-2009

    Buffer overflow is one of the main approaches to get control of vulnerable programs. This paper presents a protection technique called BFWindow for performance and resource sensitive embedded systems. By coloring data structure in memory with single associate property bit to each byte and extending the target memory block to a BFWindow(2), it validates each memory write by speculatively checking consistency of data properties within the extended buffer window. Property bits are generated by compiler statically and checked by hardware at runtime. They are transparent to users. Experimental results show that the proposed mechanism is effective to prevent sequential memory writes from crossing buffer boundaries which is the common scenario of buffer overflow exploitations. The performance overhead for practical protection mode across embedded system benchmarks is under 1%.

  • Static Mapping of Multiple Parallel Applications on Non-Hierarchical Manycore Embedded Systems

    Yining XU  Yang LIU  Junya KAIDA  Ittetsu TANIGUCHI  Hiroyuki TOMIYAMA  

     
    LETTER

      Vol:
    E99-A No:7
      Page(s):
    1417-1419

    This paper proposes a static application mapping technique, based on integer linear programming, for non-hierarchical manycore embedded systems. Unlike previous work which was designed for hierarchical manycore SoCs, this work allows more flexible application mapping to achieve higher performance. The experimental results show the effectiveness of this work.

  • A New Method of Storing Integral Image for Memory Efficiency Using Modified Block Structure

    Su-hyun LEE  Yong-jin JEONG  

     
    LETTER-Image Processing and Video Processing

      Pubricized:
    2015/07/13
      Vol:
    E98-D No:10
      Page(s):
    1888-1891

    Integral image is the sum of input image pixel values. It is mainly used to speed up the process of a box filter operation, such as Haar-like features. However, large memory capacity for integral image data can be an obstacle in an embedded environment with limited hardware. In a previous research, [5] reduced the size of integral image memory using 2×2 block structure with additional calculations. It can be easily extended to n×n block structure for further reduction, but it requires more additional calculations. In this paper, we propose a new block structure for the integral image by modifying the location of the reference pixel in the block. It results in much less additional calculations by reducing the number of memory accesses, while keeping the same amount of memory as the original block structure.

  • An Integrated Framework for Energy Optimization of Embedded Real-Time Applications

    Hideki TAKASE  Gang ZENG  Lovic GAUTHIER  Hirotaka KAWASHIMA  Noritoshi ATSUMI  Tomohiro TATEMATSU  Yoshitake KOBAYASHI  Takenori KOSHIRO  Tohru ISHIHARA  Hiroyuki TOMIYAMA  Hiroaki TAKADA  

     
    PAPER-High-Level Synthesis and System-Level Design

      Vol:
    E97-A No:12
      Page(s):
    2477-2487

    This paper presents a framework for reducing the energy consumption of embedded real-time systems. We implemented the presented framework as both an optimization toolchain and an energy-aware real-time operating system. The framework consists of the integration of multiple techniques to optimize the energy consumption. The main idea behind our approach is to utilize trade-offs between the energy consumption and the performance of different processor configurations during task checkpoints, and to maintain memory allocation during task context switches. In our framework, a target application is statically analyzed at both intra-task and inter-task levels. Based on these analyzed results, runtime optimization is performed in response to the behavior of the application. A case study shows that our toolchain and real-time operating systems have achieved energy reduction while satisfying the real-time performance. The toolchain has also been successfully applied to a practical application.

  • Static Mapping with Dynamic Switching of Multiple Data-Parallel Applications on Embedded Many-Core SoCs

    Ittetsu TANIGUCHI  Junya KAIDA  Takuji HIEDA  Yuko HARA-AZUMI  Hiroyuki TOMIYAMA  

     
    PAPER-Fundamentals of Information Systems

      Vol:
    E97-D No:11
      Page(s):
    2827-2834

    This paper studies mapping techniques of multiple applications on embedded many-core SoCs. The mapping techniques proposed in this paper are static which means the mapping is decided at design time. The mapping techniques take into account both inter-application and intra-application parallelism in order to fully utilize the potential parallelism of the many-core architecture. Additionally, the proposed static mapping supports dynamic application switching, which means the applications mapped onto the same cores are switched to each other at runtime. Two approaches are proposed for static mapping: one approach is based on integer linear programming and the other is based on a greedy algorithm. Experimental results show the effectiveness of the proposed techniques.

  • A New Integral Image Structure for Memory Size Reduction

    Su-hyun LEE  Yong-jin JEONG  

     
    LETTER-Image Processing and Video Processing

      Vol:
    E97-D No:4
      Page(s):
    998-1000

    An integral image is the sum of input image pixel values. It is mainly used to speed up the process of a box filter operation, such as Haar-like features. However, large memory for integral image data can be an obstacle in an embedded environment with limited hardware. Therefore, an efficient method to store the integral image is necessary. In this paper, we propose a memory size reduction method for integral image. The method uses four types image information: an integral image, a row integral image, a column integral image, and an input image. Using this method, integral image memory can be reduced by 42.6% on a 640×480 8-bit gray-scale input image. The same idea can be applied for bigger size images.

  • Efficient Implementation of Statistical Model-Based Voice Activity Detection Using Taylor Series Approximation

    Chungsoo LIM  Soojeong LEE  Jae-Hun CHOI  Joon-Hyuk CHANG  

     
    LETTER-Digital Signal Processing

      Vol:
    E97-A No:3
      Page(s):
    865-868

    In this letter, we propose a simple but effective technique that improves statistical model-based voice activity detection (VAD) by both reducing computational complexity and increasing detection accuracy. The improvements are made by applying Taylor series approximations to the exponential and logarithmic functions in the VAD algorithm based on an in-depth analysis of the algorithm. Experiments performed on a smartphone as well as on a desktop computer with various background noises confirm the effectiveness of the proposed technique.

1-20hit(74hit)

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.