1-8hit |
In this paper, we propose a novel decomposition method to segment multiple object regions simultaneously in cluttered videos. This method formulates object regions segmentation as a labeling problem in which we assign object IDs to the superpixels in a sequence of video frames so that the unary color matching cost is low, the assignment induces compact segments, and the superpixel labeling is consistent through time. Multi-object segmentation in a video is a combinatorial problem. We propose a binary linear formulation. Since the integer linear programming is hard to solve directly, we relax it and further decompose the relaxation into a sequence of much simpler max-flow problems. The proposed method is guaranteed to converge in a finite number of steps to the global optimum of the relaxation. It also has a high chance to obtain all integer solution and therefore achieves the global optimum. The rounding of the relaxation result gives an N-approximation solution, where N is the number of objects. Comparing to directly solving the integer program, the novel decomposition method speeds up the computation by orders of magnitude. Our experiments show that the proposed method is robust against object pose variation, occlusion and is more accurate than the competing methods while at the same time maintains the efficiency.
Most unsupervised video segmentation algorithms are difficult to handle object extraction in dynamic real-world scenes with large displacements, as foreground hypothesis is often initialized with no explicit mutual constraint on top-down spatio-temporal coherency despite that it may be imposed to the segmentation objective. To handle such situations, we propose a multiscale saliency flow (MSF) model that jointly learns both foreground and background features of multiscale salient evidences, hence allowing temporally coherent top-down information in one frame to be propagated throughout the remaining frames. In particular, the top-down evidences are detected by combining saliency signature within a certain range of higher scales of approximation coefficients in wavelet domain. Saliency flow is then estimated by Gaussian kernel correlation of non-maximal suppressed multiscale evidences, which are characterized by HOG descriptors in a high-dimensional feature space. We build the proposed MSF model in accordance with the primary object hypothesis that jointly integrates temporal consistent constraints of saliency map estimated at multiple scales into the objective. We demonstrate the effectiveness of the proposed multiscale saliency flow for segmenting dynamic real-world scenes with large displacements caused by uniform sampling of video sequences.
Segmenting foreground objects in unconstrained dynamic scenes still remains a difficult problem. We present a novel unsupervised segmentation approach that allows robust object segmentation of dynamic scenes with large displacements. To make this possible, we project motion based foreground region hypotheses generated via standard optical flow onto visual saliency regions. The motion hypotheses correspond to inside seeds mapping of the motion boundary. For visual saliency, we generalize the image signature method from images to videos to delineate saliency mapping of object proposals. The mapping of image signatures estimated in Discrete Cosine Transform (DCT) domain favor stand-out regions in the human visual system. We leverage a Markov random field built on superpixels to impose both spatial and temporal consistence constraints on the motion-saliency combined segments. Projecting salient regions via an image signature with inside mapping seeds facilitates segmenting ambiguous objects from unconstrained dynamic scenes in presence of large displacements. We demonstrate the performance on fourteen challenging unconstrained dynamic scenes, compare our method with two state-of-the-art unsupervised video segmentation algorithms, and provide quantitative and qualitative performance comparisons.
Sungchan OH Hyug-Jae LEE Gyeonghwan KIM
This letter presents a method of adding a virtual halo effect to an object of interest in video sequences. A modified graph-cut segmentation algorithm extracts object layers. The halo is modeled by the accumulation of gradually changing Gaussians. With a synthesized blooming effect, the experimental results show that the proposed method conveys realistic halo effect.
Yoshiki YUNBE Masayuki MIYAMA Yoshio MATSUDA
This paper describes an affine motion estimation processor for real-time video segmentation. The processor estimates the dominant motion of a target region with affine parameters. The processor is based on the Pseudo-M-estimator algorithm. Introduction of an image division method and a binary weight method to the original algorithm reduces data traffic and hardware costs. A pixel sampling method is proposed that reduces the clock frequency by 50%. The pixel pipeline architecture and a frame overlap method double throughput. The processor was prototyped on an FPGA; its function and performance were subsequently verified. It was also implemented as an ASIC. The core size is 5.05.0 mm2 in 0.18 µm process, standard cell technology. The ASIC can accommodate a VGA 30 fps video with 120 MHz clock frequency.
Chung-Lin WEN Bing-Yu CHEN Yoichi SATO
In this paper, we present an interactive and intuitive graph-cut-based video segmentation system while taking both color and motion information into consideration with a stroke-based user interface. Recently, graph-cut-based methods become prevalent for image and video segmentation. However, most of them deal with color information only and usually failed under circumstances where there are some regions in both foreground and background with similar colors. Unfortunately, it is usually hard to avoid, especially when the objects are filmed under a natural environment. To make such methods more practical to use, we propose a graph-cut-based video segmentation method based on both color and motion information, since the foreground objects and the background usually have different motion patterns. Moreover, to make the refinement mechanism easy to use, the strokes drawn by the user are propagated to the temporal-spatial video volume according to the motion information for visualization, so that the user can draw some additional strokes to refine the segmentation result in the video volume. The experiment results show that by combining both color and motion information, our system can resolve the wrong labeling due to the color similarity, even the foreground moving object is behind an occlusion object.
Noriyuki MINEGISHI Junichi MIYAKOSHI Yuki KURODA Tadayoshi KATAGIRI Yuki FUKUYAMA Ryo YAMAMOTO Masayuki MIYAMA Kousuke IMAMURA Hideo HASHIMOTO Masahiko YOSHIMOTO
An optical flow processor architecture is proposed. It offers accuracy and image-size scalability for video segmentation extraction. The Hierarchical Optical flow Estimation (HOE) algorithm [1] is optimized to provide an appropriate bit-length and iteration number to realize VLSI. The proposed processor architecture provides the following features. First, an algorithm-oriented data-path is introduced to execute all necessary processes of optical flow derivation allowing hardware cost minimization. The data-path is designed using 4-SIMD architecture, which enables high-throughput operation. Thereby, it achieves real-time optical flow derivation with 100% pixel density. Second, it has scalable architecture for higher accuracy and higher resolution. A third feature is the CMOS-process compatible on-chip 2-port DRAM for die-area reduction. The proposed processor has performance for CIF 30 fr/s with 189 MHz clock frequency. Its estimated core size is 6.025.33 mm2 with six-metal 90-nm CMOS technology.
We consider the edge-linking approach for accurate locating of moving object boundaries in video segmentation. We review the existing methods and propose a scheme designed for efficiency and better accuracy. The scheme first obtains a very rough outline of an object by a suitable means, e.g., change detection. It then forms a relatively compact image region that properly contains the object, through a procedure termed "mask sketch." Finally, the outermost edges in the region are found and linked via a shortest-path algorithm. Experiments show that the scheme yields good performance.