1-9hit |
Yukihiro IGUCHI Tsutomu SASAO Munehiro MATSUURA
Three types of ternary decision diagrams (TDDs) are considered: AND -TDDs, EXOR-TDDs, and Kleene-TDDs. Kleene-TDDs are useful for logic simulation in the presence of unknown inputs. Let N(BDD:f), N(AND-TDD:f), and N(EXOR-TDD:f) be the number of non-terminal nodes in the BDD, the AND-TDD, and the EXOR-TDD for f, respectively. Let N(Kleene-TDD:) be the number of non-terminal nodes in the Kleene -TDD for , where is the regular ternary function corresponding to f. Then N(BDD:f) N(TDD:f). For parity functions, N(BDD:f)=N(AND-TDD:f)=N(EXOR-TDD:f)=N(Kleene-TDD:). For unate functions,N(BDD:f)=N(AND-TDD:f). The sizes of Kleene-TDDs are O(3n/n), and O(n3) for arbitrary functions, and symmetric functions, respectively. There exist a 2n-variable function, where Kleene-TDDs require O(n) nodes with the best order, while O(3n) nodes in the worst order.
Hui QIN Tsutomu SASAO Yukihiro IGUCHI
This paper addresses a pipelined partial rolling (PPR) architecture for the AES encryption. The key technique is the PPR architecture. With the proposed architecture on the Altera Stratix FPGA, two PPR implementations achieve 6.45 Gbps throughput and 12.78 Gbps throughput, respectively. Compared with the unrolling implementation that achieves a throughput of 22.75 Gbps on the same FPGA, the two PPR implementations improve the memory efficiency (i.e., throughput divided by the size of memory for core) by 13.4% and 12.3%, respectively, and reduce the amount of the memory by 75% and 50%, respectively. Also, the PPR implementation has a up to 9.83% higher memory efficiency than the fastest previous FPGA implementation known to date. In terms of resource efficiency (i.e., throughput divided by the equivalent logic element or slice), one PPR implementation offers almost the same as the rolling implementation, and the other PPR implementation offers a medium value between the rolling implementation and the unrolling implementation that has the highest resource efficiency. However, the two PPR implementations can be implemented on the minimum-sized Stratix FPGA while the unrolling implementation cannot. The PPR architecture fills the gap between unrolling and rolling architectures and is suitable for small and medium-sized FPGAs.
Shinobu NAGAYAMA Tsutomu SASAO Yukihiro IGUCHI Munehiro MATSUURA
This paper considers Quasi-Reduced ordered Multi-valued Decision Diagrams with k bits (QRMDD(k)s) to represent binary logic functions. Experimental results show relations between the values of k and the numbers of nodes, the memory sizes, the numbers of memory accesses, and area-time complexity for QRMDD(k). For many benchmark functions, the numbers of nodes and memory accesses for QRMDD(k)s are nearly equal to of the corresponding Quasi-Reduced ordered Binary Decision Diagrams (QRBDDs), and the memory sizes and the area-time complexities for QRMDD(k)s are minimum when k = 2 and k = 3-6, respectively.
Hui QIN Tsutomu SASAO Munehiro MATSUURA Shinobu NAGAYAMA Kazuyuki NAKAMURA Yukihiro IGUCHI
A look-up table (LUT) cascade is a new type of a programmable logic device (PLD) that provides an alternative way to realize multiple-output functions. An LUT ring is an emulator for an LUT cascade. Compared with an LUT cascade, the LUT ring is more flexible. In this paper we discuss the realization of multiple-output functions with the LUT ring. Unlike an FPGA realization of a logic function, accurate prediction of the delay time is easy in an LUT ring realization. A prototype of an LUT ring has been custom-designed with 0.35 µm CMOS technology. Simulation results show that the LUT ring is 80 to 241 times faster than software programs on an SH-1, and 36 to 93 times faster than software programs on a PentiumIII when the frequencies for the LUT ring and the MPUs are the same, but is slightly slower than commercial FPGAs.
Munehiro MATSUURA Tsutomu SASAO Jon T. BUTLER Yukihiro IGUCHI
A shared binary decision diagram (SBDD) represents a multiple-output function, where nodes are shared among BDDs representing the various outputs. A partitioned SBDD consists of two or more SBDDs that share nodes. The separate SBDDs are optimized independently, often resulting in a reduction in the number of nodes over a single SBDD. We show a method for partitioning a single SBDD into two parts that reduces the node count. Among the benchmark functions tested, a node reduction of up to 23% is realized.
Tsutomu SASAO Yuto HORIKAWA Yukihiro IGUCHI
A classification function maps a set of vectors into several classes. A machine learning problem is treated as a design problem for partially defined classification functions. To realize classification functions for MNIST hand written digits, three different architectures are considered: Single-unit realization, 45-unit realization, and 45-unit ×r realization. The 45-unit realization consists of 45 ternary classifiers, 10 counters, and a max selector. Test accuracy of these architectures are compared using MNIST data set.
Atsumu ISENO Yukihiro IGUCHI Tsutomu SASAO
In this paper, we show a method to locate a single stuck-at fault of a random access memory (RAM). From the fail-bitmaps of the RAM, we obtain their Walsh spectrum. For a single stuck-at fault, we show that the fault can be identified and located by using only the 0-th and 1-st coefficients of the spectrum. We also show a circuit to compute these coefficients. The computation time is O(2n), where n is the number of bits in the address of the RAM. The computation time is much shorter than one that uses a logic minimization method.
Yukihiro IGUCHI Tsutomu SASAO Munehiro MATSUURA
In arithmetic circuits for digital signal processing, radixes other than two are often used to make circuits faster. In such cases, radix converters are necessary. However, in general, radix converters tend to be complex. This paper considers design methods for p-nary to binary converters. First, it considers Look-Up Table (LUT) cascade realizations. Then, it introduces a new design technique called arithmetic decomposition by using LUTs and adders. Finally, it compares the amount of hardware and performance of radix converters implemented by FPGAs. 12-digit ternary to binary converters on Cyclone II FPGAs designed by the proposed method are faster than ones by conventional methods.
Tsutomu SASAO Yuta URANO Yukihiro IGUCHI
This paper shows a method to find a linear transformation that reduces the number of variables to represent a given incompletely specified index generation function. It first generates the difference matrix, and then finds a minimal set of variables using a covering table. Linear transformations are used to modify the covering table to produce a smaller solution. Reduction of the difference matrix is also considered.