IEICE globals.ieice.org Site

Author Search Result

[Author] Chin-Long WEY(5hit)

1-5hit

Efficient Algorithm and Fast Hardware Implementation for Multiply-by-(1+2^k)
Chin-Long WEY Ping-Chang JUI Muh-Tian SHIUE

PAPER-VLSI Design Technology and CAD

Vol:
E98-A No:4
Page(s):
966-974
A constant multiplier performs a multiplication of a data-input with a constant value. Constant multipliers are essential components in various types of arithmetic circuits, such as filters in digital signal processor (DSP) units, and they are prevalent in modern VLSI designs. This study presents an efficient algorithm and fast hardware implementation for performing multiply-by-(1+2k) operation with additions. No multiplications are needed. The value of (1+2k)N can be computed by adding N to its k-bit left-shifted value 2kN. The additions can be performed by the full-adder-based (FA-based) ripple carry adder (RCA) for simple architecture. This paper introduces the unit cells for additions (UCAs) to construct the UCA-based RCA which achieves 35% faster than the FA-based RCA in speed performance. Further, in order to improve the speed performance, a simple and modular hybrid adder is presented with the proposed UCA concept, where the carry lookahead adder (CLA) as a module and many of the CLA modules are serially connected in a fashion similar to the RCA. Results show that the hybrid adder significantly improves the speed performance.
A Low-Cost Continuous-Flow Parallel Memory-Based FFT Processor for UWB Applications
Chin-Long WEY Shin-Yo LIN Hsu-Sheng WANG Hung-Lieh CHEN Chun-Ming HUANG

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:1
Page(s):
315-323
In UWB systems, data symbols are transmitted and received continuously. The Fast Fourier Transform (FFT) processor must be able to seamlessly process input/output data. This paper presents the design and implementation of a continuous data flow parallel memory-based FFT (CF-PMBFFT) processor without the use of input buffer for pre-loading the input data. The processor realizes a memory space of two N-words and multiple processing elements (PEs) to achieve the seamless data flow and meet the design requirement. The circuit has been fabricated in TSMC 0.18 µm 1P6M CMOS process with the supply voltage of 1.8 V. Measurement results of the test chip shows that the developed CF-PMBFFT processor takes a core area of 1.97 mm2 with a power consumption of 62.12 mW for a throughput rate of 528 MS/s.
Design Methodology for Yield Enhancement of Switched-Capacitor Analog Integrated Circuits
Pei-Wen LUO Jwu-E CHEN Chin-Long WEY

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:1
Page(s):
352-361
Device mismatch plays an important role in the design of accurate analog circuits. The common centroid structure is commonly employed to reduce device mismatches caused by symmetrical layouts and processing gradients. Among the candidate placements generated by the common centroid approach, however, whichever achieves better matching is generally difficult to be determined without performing the time-consuming yield evaluation process. In addition, this rule-based methodology makes it difficult to achieve acceptable matching between multiple capacitors and to handle an irregular layout area. Based on a spatial correlation model, this study proposed a design methodology for yield enhancement of analog circuits using switched-capacitor techniques. An efficient and effective placement generator is developed to derive a placement for a circuit to achieve the highest or near highest correlation coefficient and thus accomplishing a better yield performance. A simple yield analysis is also developed to evaluate the achieved yield performance of a derived placement. Results show that the proposed methodology derives a placement which achieves better yield performance than those generated by the common centroid approach.
Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications
Chin-Long WEY Shin-Yo LIN Pei-Yun TSAI Ming-Der SHIEH

PAPER-VLSI Design Technology and CAD

Vol:
E94-A No:7
Page(s):
1530-1539
Multi-core processors have been attracting a great deal of attention. In the domain of signal processing for communications, the current trends toward rapidly evolving standards and formats, and toward algorithms adaptive to dynamic factors in the environment, require programmable solutions that possess both algorithm flexibility and low implementation complexity. Reconfigurable architectures have demonstrated better tradeoffs between algorithm flexibility, implementation complexity, and energy efficiency. This paper presents a reconfigurable homogeneous memory-based FFT processor (MBFFT) architecture integrated in a single chip to provide hybrid SISO/MIMO OFDM wireless communication systems. For example, a reconfigurable MBFFT processor with eight processing elements (PEs) can be configured for one DVB-T/H with N=8192 and two 802.11n with N=128. The reconfigurable processors can perfectly fit the applications of Software Defined Radio (SDR) which requires more hardware flexibility.
Efficient Multiply-by-3 and Divide-by-3 Algorithms and Their Fast Hardware Implementation
Chin-Long WEY Ping-Chang JUI Gang-Neng SUNG

PAPER-VLSI Design Technology and CAD

Vol:
E97-A No:2
Page(s):
616-623
This study presents efficient algorithms for performing multiply-by-3 (3N) and divide-by-3 (N/3) operations with the additions and subtractions, respectively. No multiplications and divisions are needed. Full adder (FA) and full subtractor (FS) can be implemented to realize the N3 and N/3 operations, respectively. For fast hardware implementation, this paper introduces two basic cells UCA and UCS for 3N and N/3 operations, respectively. For 3N operation, the UCA-based ripple carry adder (RCA) and carry lookahead adder (CLA) designs are proposed and their speed performances are estimated based on the delay data of standard cell library in TSMC 0.18µm CMOS process. Results show that the 16-bit UCA-based RCA is about 3 times faster than the conventional FA-based RCA and even 25% faster than the FA-based CLA. The proposed 16-bit and 64-bit UCA-based CLAs are 62% and 36% faster than the conventional FA-based CLAs, respectively. For N/3 operations, ripple borrow subtractor (RBS) is also presented. The 16-bit UCS-based RBS is about 15.5% faster than the 16-bit FS-based RBS.

Author Search Result

[Author] Chin-Long WEY(5hit)

Efficient Algorithm and Fast Hardware Implementation for Multiply-by-(1+2^k)

A Low-Cost Continuous-Flow Parallel Memory-Based FFT Processor for UWB Applications

Design Methodology for Yield Enhancement of Switched-Capacitor Analog Integrated Circuits

Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications

Efficient Multiply-by-3 and Divide-by-3 Algorithms and Their Fast Hardware Implementation

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Author Search Result

[Author] Chin-Long WEY(5hit)

Efficient Algorithm and Fast Hardware Implementation for Multiply-by-(1+2k)

A Low-Cost Continuous-Flow Parallel Memory-Based FFT Processor for UWB Applications

Design Methodology for Yield Enhancement of Switched-Capacitor Analog Integrated Circuits

Reconfigurable Homogenous Multi-Core FFT Processor Architectures for Hybrid SISO/MIMO OFDM Wireless Communications

Efficient Multiply-by-3 and Divide-by-3 Algorithms and Their Fast Hardware Implementation

Latest Issue

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles

Efficient Algorithm and Fast Hardware Implementation for Multiply-by-(1+2^k)