1-2hit |
Hongjie XU Jun SHIOMI Tohru ISHIHARA Hidetoshi ONODERA
This paper focuses on power-area trade-off axis to memory systems. Compared with the power-performance-area trade-off application on the traditional high performance cache, this paper focuses on the edge processing environment which is becoming more and more important in the Internet of Things (IoT) era. A new power-oriented trade-off is proposed for on-chip cache architecture. As a case study, this paper exploits a good energy efficiency of Standard-Cell Memory (SCM) operating in a near-threshold voltage region and a good area efficiency of Static Random Access Memory (SRAM). A hybrid 2-level on-chip cache structure is first introduced as a replacement of 6T-SRAM cache as L0 cache to save the energy consumption. This paper proposes a method for finding the best capacity combination for SCM and SRAM, which minimizes the energy consumption of the hybrid cache under a specific cache area constraint. The simulation result using a 65-nm process technology shows that up to 80% energy consumption is reduced without increasing the die area by replacing the conventional SRAM instruction cache with the hybrid 2-level cache. The result shows that energy consumption can be reduced if the area constraint for the proposed hybrid cache system is less than the area which is equivalent to a 8kB SRAM. If the target operating frequency is less than 100MHz, energy reduction can be achieved, which implies that the proposed cache system is suitable for low-power systems where a moderate processing speed is required.
Hongjie XU Jun SHIOMI Hidetoshi ONODERA
Hardware accelerators are designed to support a specialized processing dataflow for everchanging deep neural networks (DNNs) under various processing environments. This paper introduces two hardware properties to describe the cost of data movement in each memory hierarchy. Based on the hardware properties, this paper proposes a set of evaluation metrics that are able to evaluate the number of memory accesses and the required memory capacity according to the specialized processing dataflow. Proposed metrics are able to analytically predict energy, throughput, and area of a hardware design without detailed implementation. Once a processing dataflow and constraints of hardware resources are determined, the proposed evaluation metrics quickly quantify the expected hardware benefits, thereby reducing design time.