1-4hit |
Yuya DEGAWA Toru KOIZUMI Tomoki NAKAMURA Ryota SHIOYA Junichiro KADOMOTO Hidetsugu IRIE Shuichi SAKAI
One of the performance bottlenecks of a processor is the front-end that supplies instructions. Various techniques, such as cache replacement algorithms and hardware prefetching, have been investigated to facilitate smooth instruction supply at the front-end and to improve processor performance. In these approaches, one of the most important factors has been the reduction in the number of instruction cache misses. By using the number of instruction cache misses or derived factors, previous studies have explained the performance improvements achieved by their proposed methods. However, we found that the number of instruction cache misses does not always explain performance changes well in modern processors. This is because the front-end in modern processors handles subsequent instruction cache misses in overlap with earlier ones. Based on this observation, we propose a novel factor: the number of miss regions. We define a region as a sequence of instructions from one branch misprediction to the next, while we define a miss region as a region that contains one or more instruction cache misses. At the boundary of each region, the pipeline is flushed owing to a branch misprediction. Thus, cache misses after this boundary are not handled in overlap with cache misses before the boundary. As a result, the number of miss regions is equal to the number of cache misses that are processed without overlap. In this paper, we demonstrate that the number of miss regions can well explain the variation in performance through mathematical models and simulation results. The results show that the model explains cycles per instruction with an average error of 1.0% and maximum error of 4.1% when applying an existing prefetcher to the instruction cache. The idea of miss regions highlights that instruction cache misses and branch mispredictions interact with each other in processors with a decoupled front-end. We hope that considering this interaction will motivate the development of fast performance estimation methods and new microarchitectural methods.
Rin OISHI Junichiro KADOMOTO Hidetsugu IRIE Shuichi SAKAI
As more and more programs handle personal information, the demand for secure handling of data is increasing. The protocol that satisfies this demand is called Secure function evaluation (SFE) and has attracted much attention from a privacy protection perspective. In two-party SFE, two mutually untrustworthy parties compute an arbitrary function on their respective secret inputs without disclosing any information other than the output of the function. For example, it is possible to execute a program while protecting private information, such as genomic information. The garbled circuit (GC) — a method of program obfuscation in which the program is divided into gates and the output is calculated using a symmetric key cipher for each gate — is an efficient method for this purpose. However, GC is computationally expensive and has a significant overhead even with an accelerator. We focus on hardware acceleration because of the nature of GC, which is limited to certain types of calculations, such as encryption and XOR. In this paper, we propose an architecture that accelerates garbling by running multiple garbling engines simultaneously based on the latest FPGA-based GC accelerator. In this architecture, managers are introduced to perform multiple rows of pipeline processing simultaneously. We also propose an optimized implementation of RAM for this FPGA accelerator. As a result, it achieves an average performance improvement of 26% in garbling the same set of programs, compared to the state-of-the-art (SOTA) garbling accelerator.
Li-Chung HSU Junichiro KADOMOTO So HASEGAWA Atsutake KOSUGE Yasuhiro TAKE Tadahiro KURODA
ThruChip interface (TCI) is an emerging wireless interface in three-dimensional (3-D) integrated circuit (IC) technology. However, the TCI physical design guidelines remain unclear. In this paper, a ThruChip test chip is designed and fabricated for design guidelines exploration. Three inductive coupling interface physical design scenarios, baseline, power mesh, and dummy metal fill, are deployed in the test chip. In the baseline scenario, the test chip measurement results show that thinning chip or enlarging coil dimension can further reduce TCI power. The power mesh scenario shows that the eddy current on power mesh can dramatically reduce magnetic pulse signal and thus possibly cause TCI to fail. A power mesh splitting method is proposed to effectively suppress eddy current impact while minimizing power mesh structure impact. The simulation results show that the proposed method can recover 77% coupling coefficient loss while only introducing additional 0.5% IR-drop. In dummy metal fill case, dummy metal fill enclosed within TCI coils have no impact on TCI transmission and thus are ignorable.
Junichiro KADOMOTO So HASEGAWA Yusuke KIUCHI Atsutake KOSUGE Tadahiro KURODA
This paper presents analysis and simple design guideline for ThruChip Interface (TCI) as located by LC-VCO which is used in high-speed SoC. The electromagnetic interference (EMI) from TCI channels to LC-VCO is analyzed and evaluated. The accuracy of the analysis and design guidelines is verified through the test-chip verification.