1-13hit |
Jia QIN Huihui BAI Mengmeng ZHANG Yao ZHAO
High Efficiency Video Coding (HEVC) is the latest coding standard. Compared with Advanced Video coding (H.264/AVC), HEVC offers about a 50% bitrate reduction at the same reconstructed video quality. However, this new coding standard leads to enormous computational complexity, which makes it difficult to encode video in real time. Therefore, in this paper, aiming at the high complexity of intra coding in HEVC, a new fast coding unit (CU) splitting algorithm is proposed based on the decision tree. Decision tree, as a method of machine learning, can be designed to determine the size of CUs adaptively. Here, two significant features, Just Noticeable Difference (JND) values and coding bits of each CU can be extracted to train the decision tree, according to their relationships with the CUs' partitions. The experimental results have revealed that the proposed algorithm can save about 34% of time, on average, with only a small increase of BD-rate under the “All_Intra” setting, compared with the HEVC reference software.
ShaoWei WENG Yao ZHAO Jeng-Shyang PAN
A reversible data hiding scheme based on the companding technique and the difference expansion (DE) of triplets is proposed in this paper. The companding technique is employed to increase the number of the expandable triplets. The capacity consumed by the location map recording the expanded positions is largely decreased. As a result, the hiding capacity is considerably increased. The experimental results reveal that high hiding capacity can be achieved at low embedding distortion.
Ting ZHANG Huihui BAI Mengmeng ZHANG Yao ZHAO
Multiple description (MD) coding is an attractive framework for robust information transmission over non-prioritized and unpredictable networks. In this paper, a novel MD image coding scheme is proposed based on convolutional neural networks (CNNs), which aims to improve the reconstructed quality of side and central decoders. For this purpose initially, a given image is encoded into two independent descriptions by sub-sampling. Such a design can make the proposed method compatible with the existing image coding standards. At the decoder, in order to achieve high-quality of side and central image reconstruction, three CNNs, including two side decoder sub-networks and one central decoder sub-network, are adopted into an end-to-end reconstruction framework. Experimental results show the improvement achieved by the proposed scheme in terms of both peak signal-to-noise ratio values and subjective quality. The proposed method demonstrates better rate central and side distortion performance.
Yinwei ZHAN Yaodong LI Zhuo YANG Yao ZHAO Huaiyu WU
Heat map is an important tool for eye tracking data analysis and visualization. It is very intuitive to express the area watched by observer, but ignores saccade information that expresses gaze shift. Based on conventional heat map generation method, this paper presents a novel heat map generation method for eye tracking data. The proposed method introduces a mixed data structure of fixation points and saccades, and considers heat map deformation for saccade type data. The proposed method has advantages on indicating gaze transition direction while visualizing gaze region.
Lili MENG Yao ZHAO Anhong WANG Jeng-Shyang PAN Huihui BAI
A stereo video coding scheme which is compatible with monoview-processor is presented in this paper. At the same time, this paper proposes an adaptive prediction structure which can make different prediction modes to be applied to different groups of picture (GOPs) according to temporal correlations and interview correlations to improve the coding efficiency. Moreover, the most advanced video coding standard H.264 is used conveniently for maximize the coding efficiency in this paper. Finally, the effectiveness of the proposed scheme is verified by extensive experimental results.
Yufeng ZHAO Yao ZHAO Zhenfeng ZHU Jeng-Shyang PAN
A novel automatic image annotation (AIA) scheme is proposed based on multiple-instance learning (MIL). For a given concept, manifold ranking (MR) is first employed to MIL (referred as MR-MIL) for effectively mining the positive instances (i.e. regions in images) embedded in the positive bags (i.e. images). With the mined positive instances, the semantic model of the concept is built by the probabilistic output of SVM classifier. The experimental results reveal that high annotation accuracy can be achieved at region-level.
Wei ZHAO Pengpeng YANG Rongrong NI Yao ZHAO Haorui WU
Recently, image forensics community has paid attention to the research on the design of effective algorithms based on deep learning technique. And facts proved that combining the domain knowledge of image forensics and deep learning would achieve more robust and better performance than the traditional schemes. Instead of improving algorithm performance, in this paper, the safety of deep learning based methods in the field of image forensics is taken into account. To the best of our knowledge, this is the first work focusing on this topic. Specifically, we experimentally find that the method using deep learning would fail when adding the slight noise into the images (adversarial images). Furthermore, two kinds of strategies are proposed to enforce security of deep learning-based methods. Firstly, a penalty term to the loss function is added, which is the 2-norm of the gradient of the loss with respect to the input images, and then an novel training method is adopt to train the model by fusing the normal and adversarial images. Experimental results show that the proposed algorithm can achieve good performance even in the case of adversarial images and provide a security consideration for deep learning-based image forensics.
Nan LIU Yao ZHAO Zhenfeng ZHU Rongrong NI
This paper presents a commercial shot classification scheme combining well-designed visual and textual features to automatically detect TV commercials. To identify the inherent difference between commercials and general programs, a special mid-level textual descriptor is proposed, aiming to capture the spatio-temporal properties of the video texts typical of commercials. In addition, we introduce an ensemble-learning based combination method, named Co-AdaBoost, to interactively exploit the intrinsic relations between the visual and textual features employed.
Huihui BAI Mengmeng ZHANG Anhong WANG Meiqin LIU Yao ZHAO
A novel standard-compliant multiple description (MD) video codec is proposed in this paper, which aims to achieve effective redundancy allocation using inter- and intra-description correlation. The inter-description correlation at macro block (MB) level is applied to produce side information of different modes which is helpful for better side decoding quality. Furthermore, the intra-description correlation at MB level is exploited to design the adaptive skip mode for higher compression efficiency. The experimental results exhibit a better rate of side and central distortion performance compared with other relevant MDC schemes.
Meng ZHANG Huihui BAI Meiqin LIU Anhong WANG Mengmeng ZHANG Yao ZHAO
As an ongoing video compression standard, High Efficiency Video Coding (HEVC) has achieved better rate distortion performance than H.264, but it also leads to enormous encoding complexity. In this paper, we propose a novel fast coding unit partition algorithm in the intra prediction of HEVC. Firstly, instead of the time-consuming rate distortion optimization for coding mode decision, just-noticeable-difference (JND) values can be exploited to partition the coding unit according to human visual system characteristics. Furthermore, coding bits in HEVC can also be considered as assisted information to refine the partition results. Compared with HEVC test model HM10.1, the experimental results show that the fast intra mode decision algorithm provides over 28% encoding time saving on average with comparable rate distortion performance.
Lijing MA Huihui BAI Mengmeng ZHANG Yao ZHAO
In this paper, a novel scheme of the adaptive sampling of block compressive sensing is proposed for natural images. In view of the contents of images, the edge proportion in a block can be used to represent its sparsity. Furthermore, according to the edge proportion, the adaptive sampling rate can be adaptively allocated for better compressive sensing recovery. Given that there are too many blocks in an image, it may lead to a overhead cost for recording the ratio of measurement of each block. Therefore, K-means method is applied to classify the blocks into clusters and for each cluster a kind of ratio of measurement can be allocated. In addition, we design an iterative termination condition to reduce time-consuming in the iteration of compressive sensing recovery. The experimental results show that compared with the corresponding methods, the proposed scheme can acquire a better reconstructed image at the same sampling rate.
Huawei TIAN Yao ZHAO Zheng WANG Rongrong NI Lunming QIN
With the rapid development of multi-view video coding (MVC) and light field rendering (LFR), Free-View Television (FTV) has emerged as new entrainment equipment, which can bring more immersive and realistic feelings for TV viewers. In FTV broadcasting system, the TV-viewer can freely watch a realistic arbitrary view of a scene generated from a number of original views. In such a scenario, the ownership of the multi-view video should be verified not only on the original views, but also on any virtual view. However, capacities of existing watermarking schemes as copyright protection methods for LFR-based FTV are only one bit, i.e., presence or absence of the watermark, which seriously impacts its usage in practical scenarios. In this paper, we propose a robust multi-bit watermarking scheme for LFR-based free-view video. The direct-sequence code division multiple access (DS-CDMA) watermark is constructed according to the multi-bit message and embedded into DCT domain of each view frame. The message can be extracted bit-by-bit from a virtual frame generated at an arbitrary view-point with a correlation detector. Furthermore, we mathematically prove that the watermark can be detected from any virtual view. Experimental results also show that the watermark in FTV can be successfully detected from a virtual view. Moreover, the proposed watermark method is robust against common signal processing attacks, such as Gaussian filtering, salt & peppers noising, JPEG compression, and center cropping.
Lianpeng LI Jian DONG Decheng ZUO Yao ZHAO Tianyang LI
For cloud data center, Virtual Machine (VM) consolidation is an effective way to save energy and improve efficiency. However, inappropriate consolidation of VMs, especially aggressive consolidation, can lead to performance problems, and even more serious Service Level Agreement (SLA) violations. Therefore, it is very important to solve the tradeoff between reduction in energy use and reduction of SLA violation level. In this paper, we propose two Host State Detection algorithms and an improved VM placement algorithm based on our proposed Host State Binary Decision Tree Prediction model for SLA-aware and energy-efficient consolidation of VMs in cloud data centers. We propose two formulas of conditions for host state estimate, and our model uses them to build a Binary Decision Tree manually for host state detection. We extend Cloudsim simulator to evaluate our algorithms by using PlanetLab workload and random workload. The experimental results show that our proposed model can significantly reduce SLA violation rates while keeping energy cost efficient, it can reduce the metric of SLAV by at most 98.12% and the metric of Energy by at most 33.96% for real world workload.