Author Search Result

[Author] Weizhe ZHANG(3hit)

1-3hit
  • Exploring Social Relations for Personalized Tag Recommendation in Social Tagging Systems

    Kaipeng LIU  Binxing FANG  Weizhe ZHANG  

     
    PAPER

      Vol:
    E94-D No:3
      Page(s):
    542-551

    With the emergence of Web 2.0, social tagging systems become highly popular in recent years and thus form the so-called folksonomies. Personalized tag recommendation in social tagging systems is to provide a user with a ranked list of tags for a specific resource that best serves the user's needs. Many existing tag recommendation approaches assume that users are independent and identically distributed. This assumption ignores the social relations between users, which are increasingly popular nowadays. In this paper, we investigate the role of social relations in the task of tag recommendation and propose a personalized collaborative filtering algorithm. In addition to the social annotations made by collaborative users, we inject the social relations between users and the content similarities between resources into a graph representation of folksonomies. To fully explore the structure of this graph, instead of computing similarities between objects using feature vectors, we exploit the method of random-walk computation of similarities, which furthermore enable us to model a user's tag preferences with the similarities between the user and all the tags. We combine both the collaborative information and the tag preferences to recommend personalized tags to users. We conduct experiments on a dataset collected from a real-world system. The results of comparative experiments show that the proposed algorithm outperforms state-of-the-art tag recommendation algorithms in terms of prediction quality measured by precision, recall and NDCG.

  • Exploring Web Partition in DHT-Based Distributed Web Crawling

    Xiao XU  Weizhe ZHANG  Hongli ZHANG  Binxing FANG  

     
    PAPER

      Vol:
    E93-D No:11
      Page(s):
    2907-2921

    The basic requirements of the distributed Web crawling systems are: short download time, low communication overhead and balanced load which largely depends on the systems' Web partition strategies. In this paper, we propose a DHT-based distributed Web crawling system and several DHT-based Web partition methods. First, a new system model based on a DHT method called the Content Addressable Network (CAN) is proposed. Second, based on this model, a network-distance-based Web partition is implemented to reduce the crawler-crawlee network distance in a fully distributed manner. Third, by utilizing the locality on the link space, we propose the concept of link-based Web partition to reduce the communication overhead of the system. This method not only reduces the number of inter-links to be exchanged among the crawlers but also reduces the cost of routing on the DHT overlay. In order to combine the benefits of the above two Web partition methods, we then propose 2 distributed multi-objective Web partition methods. Finally, all the methods we propose in this paper are compared with existing system models in the simulated experiments under different datasets and different system scales. In most cases, the new methods show their superiority.

  • Efficient Distributed Web Crawling Utilizing Internet Resources

    Xiao XU  Weizhe ZHANG  Hongli ZHANG  Binxing FANG  

     
    PAPER-Data Engineering, Web Information Systems

      Vol:
    E93-D No:10
      Page(s):
    2747-2762

    Internet computing is proposed to exploit personal computing resources across the Internet in order to build large-scale Web applications at lower cost. In this paper, a DHT-based distributed Web crawling model based on the concept of Internet computing is proposed. Also, we propose two optimizations to reduce the download time and waiting time of the Web crawling tasks in order to increase the system's throughput and update rate. Based on our contributor-friendly download scheme, the improvement on the download time is achieved by shortening the crawler-crawlee RTTs. In order to accurately estimate the RTTs, a network coordinate system is combined with the underlying DHT. The improvement on the waiting time is achieved by redirecting the incoming crawling tasks to light-loaded crawlers in order to keep the queue on each crawler equally sized. We also propose a simple Web site partition method to split a large Web site into smaller pieces in order to reduce the task granularity. All the methods proposed are evaluated through real Internet tests and simulations showing satisfactory results.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.