1-4hit |
Qianjian XING Feng YU Xiaobo YIN Bei ZHAO
In this letter, we present a radix-R regular interconnection pattern family of factorizations for the WHT-FFT with identical stage-to-stage interconnection pattern in a unified form, where R is any power of 2. This family of algorithms has identical sparse matrix factorization in each stage and can be implemented in a merged butterfly structure, which conduce to regular and efficient memory managing scalable to high radices. And in each stage, the butterflies with same twiddle factor set are aggregated together, which can reduce the twiddle factor evaluations or accesses to the lookup table. The kinds of factorization can also be extended to FFT, WHT and SCHT with identical stage-to-stage interconnection pattern.
Qianjian XING Zhenguo MA Feng YU
This letter presents a novel memory-based architecture for radix-2 fast Walsh-Hadamard-Fourier transform (FWFT) based on the constant geometry FWFT algorithm. It is composed of a multi-function Processing Engine, a conflict-free memory addressing scheme and an efficient twiddle factor generator. The address for memory access and the control signals for stride permutation are formulated in detail and the methods can be applied to other memory-based FFT-like architectures.
Tingting CHEN Weijun LI Feng YU Qianjian XING
A modular serial pipelined sorting architecture for continuous input sequences is presented. It supports continuous sequences, whose lengths can be dynamically changed, and does so using a very simple control strategy. It consists of identical serial cascaded sorting cells, and lends itself to high frequency implementation with any number of sorting cells, because both data and control signals are pipelined. With L cascaded sorting cells, it produces a fully sorted result for sequences whose length N is equal to or less than L+1; for longer sequences, the largest L elements are sorted out. Being modularly designed, several independent smaller sorters can be dynamically configured to form a larger sorter.
Min YUAN Qianjian XING Zhenguo MA Feng YU Yingke XU
In this letter, we present a novel single-precision floating-point multiply-accumulator (FNA-MAC) to achieve lower hardware resource, reduced computing latency and improved computing accuracy for continuous dot product operations. By further fusing the normalization and alignment in the traditional FMA algorithm, the proposed architecture eliminates the first N-1 normalization and rounding operations for an N-point dot product, and preserves the precision of interim results in a significant bit size that is twice of that in the traditional methods. The normalization and rounding of the final result is processed at the cost of consuming an additional multiply-add operation. The simulation results show that the improvement in computational accuracy is significant. Meanwhile, when comparing to a recently published FMA design, the proposed FNA-MAC can reduce the slice look-up table/flip-flop resource and computing latency by a fact of 18%, 33.3%, respectively.