1-3hit |
Jungang GUAN Fengwei AN Xiangyu ZHANG Lei CHEN Hans Jürgen MATTAUSCH
Efficient road-lane detection is expected to be achievable by application of the Hough transform (HT) which realizes high-accuracy straight-line extraction from images. The main challenge for HT-hardware implementation in actual applications is the trade-off optimization between accuracy maximization, power-dissipation reduction and real-time requirements. We report a HT-hardware architecture for road-lane detection with parallelized voting procedure, local maximum algorithm and FPGA-prototype implementation. Parallelization of the global design is realized on the basis of θ-value discretization in the Hough space. Four major hardware modules are developed for edge detection in the original video frames, computation of the characteristic edge-pixel values (ρ,θ) in Hough-space, voting procedure for each (ρ,θ) pair with parallel local-maximum-based peak voting-point extraction in Hough space to determine the detected straight lines. Implementation of a prototype system for real-time road-lane detection on a low-cost DE1 platform with a Cyclone II FPGA device was verified to be possible. An average detection speed of 135 frames/s for VGA (640x480)-frames was achieved at 50 MHz working frequency.
Fengwei AN Lei CHEN Toshinobu AKAZAWA Shogo YAMASAKI Hans Jürgen MATTAUSCH
Nearest-neighbor-search classifiers are attractive but they have high intrinsic computational demands which limit their practical application. In this paper, we propose a coprocessor for k (k with k≥1) nearest neighbor (kNN) classification in which squared Euclidean distances (SEDs) are mapped into the clock domain for realizing high search speed and energy efficiency. The minimal SED searching is carried out by weighted frequency dividers that drastically reduce the normally exponential increase of the worst-case search-clock number with the bit width of vector components to only a linear increase. This also results in low power dissipation and high area-efficiency in comparison to the traditional method using large numbers of adders and comparators. The kNN classifier determines the class of an unknown input sample with a majority decision among the k nearest reference samples. The required majority-decision circuit is integrated with the clock-mapping-based minimal-SED searching architecture and proceeds with the classification immediately after identification of each of the k nearest references. A test chip in 180 nm CMOS technology, which can process 8 dimensions of 32 reference vectors in parallel, achieves low power dissipation of 40.32 mW (at 51.21 MHz clock frequency and 1.8 V supply voltage). Significantly, the distance search circuit consumes only 5.99 mW. Feature vectors with different dimensionality up to 2048 dimensions can be handled by the designed coprocessor due to a dimension extension circuit, enabling large flexibility for usage in different application.
Fengwei AN Tetsushi KOIDE Hans Jürgen MATTAUSCH
In this paper, we propose a hardware solution for overcoming the problem of high computational demands in a nearest neighbor (NN) based multi-prototype learning system. The multiple prototypes are obtained by a high-speed K-means clustering algorithm utilizing a concept of software-hardware cooperation that takes advantage of the flexibility of the software and the efficiency of the hardware. The one nearest neighbor (1-NN) classifier is used to recognize an object by searching for the nearest Euclidean distance among the prototypes. The major deficiency in conventional implementations for both K-means and 1-NN is the high computational demand of the nearest neighbor searching. This deficiency is resolved by an FPGA-implemented coprocessor that is a VLSI circuit for searching the nearest Euclidean distance. The coprocessor requires 12.9% logic elements and 58% block memory bits of an Altera Stratix III E110 FPGA device. The hardware communicates with the software by a PCI Express (4) local-bus-compatible interface. We benchmark our learning system against the popular case of handwritten digit recognition in which abundant previous works for comparison are available. In the case of the MNIST database, we could attain the most efficient accuracy rate of 97.91% with 930 prototypes, the learning speed of 1.310-4 s/sample and the classification speed of 3.9410-8 s/character.