Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
Copy
Zhenyu LIU, Yang SONG, Takeshi IKENAGA, Satoshi GOTO, "A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation" in IEICE TRANSACTIONS on Fundamentals,
vol. E88-A, no. 12, pp. 3523-3530, December 2005, doi: 10.1093/ietfec/e88-a.12.3523.
Abstract: Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s
URL: https://globals.ieice.org/en_transactions/fundamentals/10.1093/ietfec/e88-a.12.3523/_p
Copy
@ARTICLE{e88-a_12_3523,
author={Zhenyu LIU, Yang SONG, Takeshi IKENAGA, Satoshi GOTO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation},
year={2005},
volume={E88-A},
number={12},
pages={3523-3530},
abstract={Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s
keywords={},
doi={10.1093/ietfec/e88-a.12.3523},
ISSN={},
month={December},}
Copy
TY - JOUR
TI - A VLSI Array Processing Oriented Fast Fourier Transform Algorithm and Hardware Implementation
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 3523
EP - 3530
AU - Zhenyu LIU
AU - Yang SONG
AU - Takeshi IKENAGA
AU - Satoshi GOTO
PY - 2005
DO - 10.1093/ietfec/e88-a.12.3523
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E88-A
IS - 12
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - December 2005
AB - Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2s data (k=0,1,..., s-1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s
ER -