Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology

Junji YAMADA; Ushio JIMBO; Ryota SHIOYA; Masahiro GOSHIMA; Shuichi SAKAI

doi:10.1587/transele.E100.C.232

Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology

Junji YAMADA, Ushio JIMBO, Ryota SHIOYA, Masahiro GOSHIMA, Shuichi SAKAI

Full Text Views

0

Share
Cite this

Summary :

An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.

Publication: IEICE TRANSACTIONS on Electronics Vol.E100-C No.3 pp.232-244

Publication Date: 2017/03/01

Publicized

Online ISSN: 1745-1353

DOI: 10.1587/transele.E100.C.232

Type of Manuscript: Special Section PAPER (Special Section on Low-Power and High-Speed Chips)

Category

Authors

Junji YAMADA
  The University of Tokyo
Ushio JIMBO
  SOKENDAI (Graduate University for Advanced Studies)
Ryota SHIOYA
  Nagoya University
Masahiro GOSHIMA
  National Institute of Informatics
Shuichi SAKAI
  The University of Tokyo

Keyword

Cite this

Copy

Junji YAMADA, Ushio JIMBO, Ryota SHIOYA, Masahiro GOSHIMA, Shuichi SAKAI, "Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology" in IEICE TRANSACTIONS on Electronics, vol. E100-C, no. 3, pp. 232-244, March 2017, doi: 10.1587/transele.E100.C.232.
Abstract: An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.
URL: https://globals.ieice.org/en_transactions/electronics/10.1587/transele.E100.C.232/_p

Copy

@ARTICLE{e100-c_3_232,
author={Junji YAMADA, Ushio JIMBO, Ryota SHIOYA, Masahiro GOSHIMA, Shuichi SAKAI, },
journal={IEICE TRANSACTIONS on Electronics},
title={Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology},
year={2017},
volume={E100-C},
number={3},
pages={232-244},
abstract={An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.},
keywords={},
doi={10.1587/transele.E100.C.232},
ISSN={1745-1353},
month={March},}

Copy

TY - JOUR
TI - Design of a Register Cache System with an Open Source Process Design Kit for 45nm Technology
T2 - IEICE TRANSACTIONS on Electronics
SP - 232
EP - 244
AU - Junji YAMADA
AU - Ushio JIMBO
AU - Ryota SHIOYA
AU - Masahiro GOSHIMA
AU - Shuichi SAKAI
PY - 2017
DO - 10.1587/transele.E100.C.232
JO - IEICE TRANSACTIONS on Electronics
SN - 1745-1353
VL - E100-C
IS - 3
JA - IEICE TRANSACTIONS on Electronics
Y1 - March 2017
AB - An 8-issue superscalar core generally requires a 24-port RAM for the register file. The area and energy consumption of a multiported RAM increase in proportional to the square of the number of ports. A register cache can reduce the area and energy consumption of the register file. However, earlier register cache systems suffer from lower IPC caused by register cache misses. Thus, we proposed the Non-Latency-Oriented Register Cache System (NORCS) to solve the IPC problem with a modified pipeline. We evaluated NORCS mainly from the viewpoint of microarchitecture in the original article, and showed that NORCS maintains almost the same IPC as conventional register files. Researchers in NVIDIA adopted the same idea for their GPUs. However, the evaluation was not sufficient from the viewpoint of LSI design. In the original article, we used CACTI to evaluate the area and energy consumption. CACTI is a design space exploration tool for cache design, and adopts some rough approximations. Therefore, this paper shows design of NORCS with FreePDK45, an open source process design kit for 45nm technology. We performed manual layout of the memory cells and arrays of NORCS, and executed SPICE simulation with RC parasitics extracted from the layout. The results show that, from a full-port register file, an 8-entry NORCS achieves a 75.2% and 48.2% reduction in area and energy consumption, respectively. The results also include the latency which we did not present in our original article. The latencies of critical path is 307ps and 318ps for an 8-entry NORCS and a conventional multiported register file, respectively, when the same two cycles are allocated to register file read.
ER -