Author Search Result

[Author] Zhihua YING(1hit)

1-1hit
  • Localization of Pointed-At Word in Printed Documents via a Single Neural Network

    Rubin ZHAO  Xiaolong ZHENG  Zhihua YING  Lingyan FAN  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2022/01/26
      Vol:
    E105-D No:5
      Page(s):
    1075-1084

    Most existing object detection methods and text detection methods are mainly designed to detect either text or objects. In some scenarios where the task is to find the target word pointed-at by an object, results of existing methods are far from satisfying. However, such scenarios happen often in human-computer interaction, when the computer needs to figure out which word the user is pointing at. Comparing with object detection, pointed-at word localization (PAWL) requires higher accuracy, especially in dense text scenarios. Moreover, in printed document, characters are much smaller than those in scene text detection datasets such as ICDAR-2013, ICDAR-2015 and ICPR-2018 etc. To address these problems, the authors propose a novel target word localization network (TWLN) to detect the pointed-at word in printed documents. In this work, a single deep neural network is trained to extract the features of markers and text sequentially. For each image, the location of the marker is predicted firstly, according to the predicted location, a smaller image is cropped from the original image and put into the same network, then the location of pointed-at word is predicted. To train and test the networks, an efficient approach is proposed to generate the dataset from PDF format documents by inserting markers pointing at the words in the documents, which avoids laborious labeling work. Experiments on the proposed dataset demonstrate that TWLN outperforms the compared object detection method and optical character recognition method on every category of targets, especially when the target is a single character that only occupies several pixels in the image. TWLN is also tested with real photographs, and the accuracy shows no significant differences, which proves the validity of the generating method to construct the dataset.

FlyerIEICE has prepared a flyer regarding multilingual services. Please use the one in your native language.