1-14hit |
Kaijie WEI Yuki KUNO Masatoshi ARAI Hideharu AMANO
Stereo depth estimation has become an attractive topic in the computer vision field. Although various algorithms strive to optimize the speed and the precision of estimation, the energy cost of a system is also an essential metric for an embedded system. Among these various algorithms, Semi-Global Matching (SGM) has been a popular choice for some real-world applications because of its accuracy-and-speed balance. However, its power consumption makes it difficult to be applied to an embedded system. Thus, we propose a robust stereo matching system, RT-libSGM, working on the Xilinx Field-Programmable Gate Array (FPGA) platforms. The dedicated design of each module optimizes the speed of the entire system while ensuring the flexibility of the system structure. Through an evaluation on a Zynq FPGA board called M-KUBOS, RT-libSGM achieves state-of-the-art performance with lower power consumption. Compared with the benchmark design (libSGM) working on the Tegra X2 GPU, RT-libSGM runs more than 2× faster at a much lower energy cost.
Zhongjian MA Dongzhen HUANG Baoqing LI Xiaobing YUAN
Current stereo matching methods benefit a lot from the precise stereo estimation with Convolutional Neural Networks (CNNs). Nevertheless, patch-based siamese networks rely on the implicit assumption of constant depth within a window, which does not hold for slanted surfaces. Existing methods for handling slanted patches focus on post-processing. In contrast, we propose a novel module for matching cost networks to overcome this bias. Slanted objects appear horizontally stretched between stereo pairs, suggesting that the feature extraction in the horizontal direction should be different from that in the vertical direction. To tackle this distortion, we utilize asymmetric convolutions in our proposed module. Experimental results show that the proposed module in matching cost networks can achieve higher accuracy with fewer parameters compared to conventional methods.
Ming LI Li SHI Xudong CHEN Sidan DU Yang LI
The large computational complexity makes stereo matching a big challenge in real-time application scenario. The problem of stereo matching in a video sequence is slightly different with that in a still image because there exists temporal correlation among video frames. However, no existing method considered temporal consistency of disparity for algorithm acceleration. In this work, we proposed a scheme called the dynamic disparity range (DDR) to optimize matching cost calculation and cost aggregation steps by narrowing disparity searching range, and a scheme called temporal cost aggregation path to optimize the cost aggregation step. Based on the schemes, we proposed the DDR-SGM and the DDR-MCCNN algorithms for the stereo matching in video sequences. Evaluation results showed that the proposed algorithms significantly reduced the computational complexity with only very slight loss of accuracy. We proved that the proposed optimizations for the stereo matching are effective and the temporal consistency in stereo video is highly useful for either improving accuracy or reducing computational complexity.
Xiaoqing YE Jiamao LI Han WANG Xiaolin ZHANG
Accurate stereo matching remains a challenging problem in case of weakly-textured areas, discontinuities and occlusions. In this letter, a novel stereo matching method, consisting of leveraging feature ensemble network to compute matching cost, error detection network to predict outliers and priority-based occlusion disambiguation for refinement, is presented. Experiments on the Middlebury benchmark demonstrate that the proposed method yields competitive results against the state-of-the-art algorithms.
Yunlong ZHAN Yuzhang GU Xiaolin ZHANG Lei QU Jiatian PI Xiaoxia HUANG Yingguan WANG Jufeng LUO Yunzhou QIU
Cost aggregation is one of the most important steps in local stereo matching, while it is difficult to fulfill both accuracy and speed. In this letter, a novel cost aggregation, consisting of guidance image, fast aggregation function and simplified scan-line optimization, is developed. Experiments demonstrate that the proposed algorithm has competitive performance compared with the state-of-art aggregation methods on 32 Middlebury stereo datasets in both accuracy and speed.
Nitin SINGHAL Jin Woo YOO Ho Yeol CHOI In Kyu PARK
In this paper, we analyze the key factors underlying the implementation, evaluation, and optimization of image processing and computer vision algorithms on embedded GPU using OpenGL ES 2.0 shader model. First, we present the characteristics of the embedded GPU and its inherent advantage when compared to embedded CPU. Additionally, we propose techniques to achieve increased performance with optimized shader design. To show the effectiveness of the proposed techniques, we employ cartoon-style non-photorealistic rendering (NPR), speeded-up robust feature (SURF) detection, and stereo matching as our example algorithms. Performance is evaluated in terms of the execution time and speed-up achieved in comparison with the implementation on embedded CPU.
Chenbo SHI Guijin WANG Xiaokang PEI Bei HE Xinggang LIN
In this paper, we propose an interleaving updating framework of disparity and confidence map (IUFDCM) for stereo matching to eliminate the redundant and interfere information from unreliable pixels. Compared with other propagation algorithms using matching cost as messages, IUFDCM updates the disparity map and the confidence map in an interleaving manner instead. Based on the Confidence-based Support Window (CSW), disparity map is updated adaptively to alleviate the effect of input parameters. The reassignment for unreliable pixels with larger probability keeps ground truth depending on reliable messages. Consequently, the confidence map is updated according to the previous disparity map and the left-right consistency. The top ranks on Middlebury benchmark corresponding to different error thresholds demonstrate that our algorithm is competitive with the best stereo matching algorithms at present.
Chenbo SHI Guijin WANG Xiaokang PEI Bei HE Xinggang LIN
This paper addresses stereo matching under scenarios of smooth region and obviously slant plane. We explore the flexible handling of color disparity, spatial relation and the reliability of matching pixels in support windows. Building upon these key ingredients, a robust stereo matching algorithm using local plane fitting by Confidence-based Support Window (CSW) is presented. For each CSW, only these pixels with high confidence are employed to estimate optimal disparity plane. Considering that RANSAC has shown to be robust in suppressing the disturbance resulting from outliers, we employ it to solve local plane fitting problem. Compared with the state of the art local methods in the computer vision community, our approach achieves the better performance and time efficiency on the Middlebury benchmark.
In this paper, we deal with the pedestrian detection task in outdoor scenes. Because of the complexity of such scenes, generally used gradient-feature-based detectors do not work well on them. We propose to use sparse 3D depth information as an additional cue to do the detection task, in order to achieve a fast improvement in performance. Our proposed method uses a probabilistic model to integrate image-feature-based classification with sparse depth estimation. Benefiting from the depth estimates, we map the prior distribution of human's actual height onto the image, and update the image-feature-based classification result probabilistically. We have two contributions in this paper: 1) a simplified graphical model which can efficiently integrate depth cue in detection; and 2) a sparse depth estimation method which could provide fast and reliable estimation of depth information. An experiment shows that our method provides a promising enhancement over baseline detector within minimal additional time.
Guang TIAN Feihu QI Masatoshi KIMACHI Yue WU Takashi IKETANI
This paper presents a 3D feature-based binocular tracking algorithm for tracking crowded people indoors. The algorithm consists of a two stage 3D feature points grouping method and a robust 3D feature-based tracking method. The two stage 3D feature points grouping method can use kernel-based ISODATA method to detect people accurately even though the part or almost full occlusion occurs among people in surveillance area. The robust 3D feature-based Tracking method combines interacting multiple model (IMM) method with a cascade multiple feature data association method. The robust 3D feature-based tracking method not only manages the generation and disappearance of a trajectory, but also can deal with the interaction of people and track people maneuvering. Experimental results demonstrate the robustness and efficiency of the proposed framework. It is real-time and not sensitive to the variable frame to frame interval time. It also can deal with the occlusion of people and do well in those cases that people rotate and wriggle.
Osafumi NAKAYAMA Morito SHIOHARA Shigeru SASAKI Tomonobu TAKASHIMA Daisuke UENO
During the period from dusk to dark, when it is difficult for drivers to see other vehicles, or when visibility is poor due to rain, snow, etc., the contrast between nearby vehicles and the background is lower. Under such conditions, conventional surveillance systems have difficulty detecting the outline of nearby vehicles and may thus fail to recognize them. To solve this problem, we have developed a rear and side surveillance system for vehicles that uses image processing. The system uses two stereo cameras to monitor the areas to the rear and sides of a vehicle, i.e., a driver's blind spots, and to detect the positions and relative speeds of other vehicles. The proposed system can estimate the shape of a vehicle from a partial outline of it, thus identifying the vehicle by filling in the missing parts of the vehicle outline. Testing of the system under various environmental conditions showed that the rate of errors (false and missed detection) in detecting approaching vehicles was reduced to less than 10%, even under conditions that are problematic for conventional processing.
Jeong-Hoon KIM Jun-Young LEE Myoung-Ho LEE
This letter proposes a 3-D stereo endoscopic image processing system. Whereas a conventional 3-D stereo endoscopic system has simple monitoring functions, the proposed system gives doctors exact depth feelings by providing them depth value information, visualization, and stereo PACS viewer to aid an education, accurate diagnosis, a surgical operation, and to further apply in a robotic surgery.
Takashi IMORI Tadahiko KIMOTO Bunpei TOUJI Toshiaki FUJII Masayuki TANIMOTO
This paper presents a new scheme to estimate depth in a natural three-dimensional scene using a multi-viewpoint image set. In the conventional Multiple-Baseline Stereo (MBS) scheme for the image set, although errors of stereo matching are somewhat reduced by using multiple stereo pairs, the use of square blocks of fixed size sometimes causes false matching, especially, in that image area where occlusion occurs and that image area of small variance of brightness levels. In the proposed scheme, the reference image is segmented into regions which are capable of being arbitrarily shaped, and a depth value is estimated for each region. Also, by comparing the image generated by projection with the original image, depth values are newly estimated in a top-down manner. Then, the error of the previous depth value is detected, and it is corrected. The results of experiments show advantages of the proposed scheme over the MBS scheme.
Satoshi NAKAGAWA Takahiro WATANABE Yuji KUNO
This paper describes a new feature extraction model (Active Model) which is extended from the active contour model (Snakes). Active Model can be applied to various energy minimizing models since it integrates most of the energy terms ever proposed into one model and also provides the new terms for multiple images such as motion and stereo images. The computational order of energy minimization process is estimated in comparison with a dynamic programming method and a greedy algorithm, and it is shown that the energy minimization process in Active Model is faster than the others. Some experimental results are also shown.