1-6hit |
Xiao XU Weizhe ZHANG Hongli ZHANG Binxing FANG
Internet computing is proposed to exploit personal computing resources across the Internet in order to build large-scale Web applications at lower cost. In this paper, a DHT-based distributed Web crawling model based on the concept of Internet computing is proposed. Also, we propose two optimizations to reduce the download time and waiting time of the Web crawling tasks in order to increase the system's throughput and update rate. Based on our contributor-friendly download scheme, the improvement on the download time is achieved by shortening the crawler-crawlee RTTs. In order to accurately estimate the RTTs, a network coordinate system is combined with the underlying DHT. The improvement on the waiting time is achieved by redirecting the incoming crawling tasks to light-loaded crawlers in order to keep the queue on each crawler equally sized. We also propose a simple Web site partition method to split a large Web site into smaller pieces in order to reduce the task granularity. All the methods proposed are evaluated through real Internet tests and simulations showing satisfactory results.
Zhikai XU Hongli ZHANG Xiangzhan YU Shen SU
Location-based services (LBSs) are useful for many applications in internet of things(IoT). However, LBSs has raised serious concerns about users' location privacy. In this paper, we propose a new location privacy attack in LBSs called hidden location inference attack, in which the adversary infers users' hidden locations based on the users' check-in histories. We discover three factors that influence individual check-in behaviors: geographic information, human mobility patterns and user preferences. We first separately evaluate the effects of each of these three factors on users' check-in behaviors. Next, we propose a novel algorithm that integrates the above heterogeneous factors and captures the probability of hidden location privacy leakage. Then, we design a novel privacy alert framework to warn users when their sharing behavior does not match their sharing rules. Finally, we use our experimental results to demonstrate the validity and practicality of the proposed strategy.
Mahmoud EMAM Qi HAN Liyang YU Hongli ZHANG
The copy-move or region duplication forgery technique is a very common type of image manipulation, where a region of the image is copied and then pasted in the same image in order to hide some details. In this paper, a keypoint-based method for copy-move forgery detection is proposed. Firstly, the feature points are detected from the image by using the Förstner Operator. Secondly, the algorithm extracts the features by using MROGH feature descriptor, and then matching the features. Finally, the affine transformation parameters can be estimated using the RANSAC algorithm. Experimental results are presented to confirm that the proposed method is effective to locate the altered region with geometric transformation (rotation and scaling).
Yanbin SUN Yu ZHANG Binxing FANG Hongli ZHANG
Information-Centric Networking (ICN) treats contents as first class citizens and adopts name-based routing for content distribution and retrieval. Content names rather than IP addresses are directly used for routing. However, due to the location-independent naming and the huge namespace, name-based routing faces scalability and efficiency issues including large routing tables and high path stretches. This paper proposes a universal Scalable Name-based Geometric Routing scheme (SNGR), which is a careful synthesis of geometric routing and name resolution. To provide scalable and efficient underlying routing, a universal geometric routing framework (GRF) is proposed. Any geometric routing scheme can be used directly for name resolution based on GRF. To implement an overlay name resolution system, SNGR utilizes a bi-level grouping design. With this design, a resolution node that is close to the consumer can always be found. Our theoretical analyses guarantee the performance of SNGR, and experiments show that SNGR outperforms similar routing schemes in terms of node state, path stretch, and reliability.
Xiao XU Weizhe ZHANG Hongli ZHANG Binxing FANG
The basic requirements of the distributed Web crawling systems are: short download time, low communication overhead and balanced load which largely depends on the systems' Web partition strategies. In this paper, we propose a DHT-based distributed Web crawling system and several DHT-based Web partition methods. First, a new system model based on a DHT method called the Content Addressable Network (CAN) is proposed. Second, based on this model, a network-distance-based Web partition is implemented to reduce the crawler-crawlee network distance in a fully distributed manner. Third, by utilizing the locality on the link space, we propose the concept of link-based Web partition to reduce the communication overhead of the system. This method not only reduces the number of inter-links to be exchanged among the crawlers but also reduces the cost of routing on the DHT overlay. In order to combine the benefits of the above two Web partition methods, we then propose 2 distributed multi-objective Web partition methods. Finally, all the methods we propose in this paper are compared with existing system models in the simulated experiments under different datasets and different system scales. In most cases, the new methods show their superiority.
With the emergence of a large quantity of data in science and industry, it is urgent to improve the prediction accuracy and reduce the high complexity of Gaussian process regression (GPR). However, the traditional global approximation and local approximation have corresponding shortcomings, such as global approximation tends to ignore local features, and local approximation has the problem of over-fitting. In order to solve these problems, a large-scale Gaussian process regression algorithm (RFFLT) combining random Fourier features (RFF) and local approximation is proposed. 1) In order to speed up the training time, we use the random Fourier feature map input data mapped to the random low-dimensional feature space for processing. The main innovation of the algorithm is to design features by using existing fast linear processing methods, so that the inner product of the transformed data is approximately equal to the inner product in the feature space of the shift invariant kernel specified by the user. 2) The generalized robust Bayesian committee machine (GRBCM) based on Tsallis mutual information method is used in local approximation, which enhances the flexibility of the model and generates a sparse representation of the expert weight distribution compared with previous work. The algorithm RFFLT was tested on six real data sets, which greatly shortened the time of regression prediction and improved the prediction accuracy.