Chee Siang LEOW Hideaki YAJIMA Tomoki KITAGAWA Hiromitsu NISHIZAKI
Text detection is a crucial pre-processing step in optical character recognition (OCR) for the accurate recognition of text, including both fonts and handwritten characters, in documents. While current deep learning-based text detection tools can detect text regions with high accuracy, they often treat multiple lines of text as a single region. To perform line-based character recognition, it is necessary to divide the text into individual lines, which requires a line detection technique. This paper focuses on the development of a new approach to single-line detection in OCR that is based on the existing Character Region Awareness For Text detection (CRAFT) model and incorporates a deep neural network specialized in line segmentation. However, this new method may still detect multiple lines as a single text region when multi-line text with narrow spacing is present. To address this, we also introduce a post-processing algorithm to detect single text regions using the output of the single-line segmentation. Our proposed method successfully detects single lines, even in multi-line text with narrow line spacing, and hence improves the accuracy of OCR.
A new adaptive binarization method is proposed for the vehicle state images obtained from the intelligent operation and maintenance system of rail transit. The method can check the corresponding vehicle status information in the intelligent operation and maintenance system of rail transit more quickly and effectively, track and monitor the vehicle operation status in real time, and improve the emergency response ability of the system. The advantages of the proposed method mainly include two points. For decolorization, we use the method of contrast preserving decolorization[1] obtain the appropriate ratio of R, G, and B for the grayscale of the RGB image which can retain the color information of the vehicle state images background to the maximum, and maintain the contrast between the foreground and the background. In terms of threshold selection, the mean value and standard deviation of gray value corresponding to multi-color background of vehicle state images are obtained by using major cluster estimation[2], and the adaptive threshold is determined by the 2 sigma principle for binarization, which can extract text, identifier and other target information effectively. The experimental results show that, regarding the vehicle state images with rich background color information, this method is better than the traditional binarization methods, such as the global threshold Otsu algorithm[3] and the local threshold Sauvola algorithm[4],[5] based on threshold, Mean-Shift algorithm[6], K-Means algorithm[7] and Fuzzy C Means[8] algorithm based on statistical learning. As an image preprocessing scheme for intelligent rail transit data verification, the method can improve the accuracy of text and identifier recognition effectively by verifying the optical character recognition through a data set containing images of different vehicle statuses.
Fuma HORIE Hideaki GOTO Takuo SUGANUMA
Scene character recognition has been intensively investigated for a couple of decades because it has a great potential in many applications including automatic translation, signboard recognition, and reading assistance for the visually-impaired. However, scene characters are difficult to recognize at sufficient accuracy owing to various noise and image distortions. In addition, Japanese scene character recognition is more challenging and requires a large amount of character data for training because thousands of character classes exist in the language. Some researchers proposed training data augmentation techniques using Synthetic Scene Character Data (SSCD) to compensate for the shortage of training data. In this paper, we propose a Random Filter which is a new method for SSCD generation, and introduce an ensemble scheme with the Random Image Feature (RI-Feature) method. Since there has not been a large Japanese scene character dataset for the evaluation of the recognition systems, we have developed an open dataset JPSC1400, which consists of a large number of real Japanese scene characters. It is shown that the accuracy has been improved from 70.9% to 83.1% by introducing the RI-Feature method to the ensemble scheme.
The natural gradient descent is an optimization method for real-valued neural networks that was proposed from the viewpoint of information geometry. Here, we present an extension of the natural gradient descent to complex-valued neural networks. Our idea is to use the Hermitian extension of the Fisher information matrix. Moreover, we generalize the projected natural gradient (PRONG), which is a fast natural gradient descent algorithm, to complex-valued neural networks. We also consider the advantage of complex-valued neural networks over real-valued neural networks. A useful property of complex numbers in the complex plane is that the rotation is simply expressed by the multiplication. By focusing on this property, we construct the output function of complex-valued neural networks, which is invariant even if the input is changed to its rotated value. Then, our complex-valued neural network can learn rotated data without data augmentation. Finally, through simulation of online character recognition, we demonstrate the effectiveness of the proposed approach.
Vince Jebryl MONTERO Yong-Jin JEONG
This paper presents an approach for developing an algorithm for automatic license plate recognition system (ALPR) on complex scenes. A plate-style classification method is also proposed in this paper to address the inherent challenges for ALPR in a system that uses multiple plate-styles (e.g., different fonts, multiple plate lay-out, variations in character sequences) which is the case in the current Philippine license plate system. Methods are proposed for each ALPR module: plate detection, character segmentation, and character recognition. K-nearest neighbor (KNN) is used as a classifier for character recognition together with a proposed confidence scoring to rate the decision made by the classifier. A small dataset of Philippine license plates but with relevant features of complex scenarios for ALPR is prepared. Using the proposed system on the prepared dataset, the performance of the system is evaluated on different categories of complex scenes. The proposed algorithm structure shows promising results and yielded an overall accuracy higher than the existing ALPR systems on the dataset consisting mostly of complex scenes.
Zhong ZHANG Hong WANG Shuang LIU Tariq S. DURRANI
A rich and robust representation for scene characters plays a significant role in automatically understanding the text in images. In this letter, we focus on the issue of feature representation, and propose a novel encoding method named bilateral convolutional activations encoded with Fisher vectors (BCA-FV) for scene character recognition. Concretely, we first extract convolutional activation descriptors from convolutional maps and then build a bilateral convolutional activation map (BCAM) to capture the relationship between the convolutional activation response and the spatial structure information. Finally, in order to obtain the global feature representation, the BCAM is injected into FV to encode convolutional activation descriptors. Hence, the BCA-FV can effectively integrate the prominent features and spatial structure information for character representation. We verify our method on two widely used databases (ICDAR2003 and Chars74K), and the experimental results demonstrate that our method achieves better results than the state-of-the-art methods. In addition, we further validate the proposed BCA-FV on the “Pan+ChiPhoto” database for Chinese scene character recognition, and the experimental results show the good generalization ability of the proposed BCA-FV.
Zhong ZHANG Hong WANG Shuang LIU Liang ZHENG
Feature representation, as a key component of scene character recognition, has been widely studied and a number of effective methods have been proposed. In this letter, we propose the novel method named coupled spatial learning (CSL) for scene character representation. Different from the existing methods, the proposed CSL method simultaneously discover the spatial context in both the dictionary learning and coding stages. Concretely, we propose to build the spatial dictionary by preserving the corresponding positions of the codewords. Correspondingly, we introduce the spatial coding strategy which utilizes the spatiality regularization to consider the relationship among features in the Euclidean space. Based on the spatial dictionary and spatial coding, the spatial context can be effectively integrated in the visual representations. We verify our method on two widely used databases (ICDAR2003 and Chars74k), and the experimental results demonstrate that our method achieves competitive results compared with the state-of-the-art methods. In addition, we further validate the proposed CSL method on the Caltech-101 database for image classification task, and the experimental results show the good generalization ability of the proposed CSL.
Yuechan HAO Bilan ZHU Masaki NAKAGAWA
This paper describes a significantly improved recognition system for on-line handwritten Japanese text free from line direction and character orientation constraints. The recognition system separates handwritten text of arbitrary character orientation and line direction into text line elements, estimates and normalizes character orientation and line direction, applies two-stage over-segmentation, constructs a segmentation-recognition candidate lattice and evaluates the likelihood of candidate segmentation-recognition paths by combining the scores of character recognition, geometric features and linguistic context. Enhancements over previous systems are made in line segmentation, over-segmentation and context integration model. The results of experiments on text from the HANDS-Kondate_t_bf-2001-11 database demonstrate significant improvements in the character recognition rate compared with the previous systems. Its recognition rate on text of arbitrary character orientation and line direction is now comparable with that possible on horizontal text with normal character orientation. Moreover, its recognition speed and memory requirement do not limit the platforms or applications that employ the recognition system.
Koichi KISE Shinichiro OMACHI Seiichi UCHIDA Masakazu IWAMURA Marcus LIWICKI
This paper reviews several trials of re-designing conventional communication medium, i.e., characters, for enriching their functions by using data-embedding techniques. For example, characters are re-designed to have better machine-readability even under various geometric distortions by embedding a geometric invariant into each character image to represent class label of the character. Another example is to embed various information into handwriting trajectory by using a new pen device, called a data-embedding pen. An experimental result showed that we can embed 32-bit information into a handwritten line of 5 cm length by using the pen device. In addition to those applications, we also discuss the relationship between data-embedding and pattern recognition in a theoretical point of view. Several theories tell that if we have appropriate supplementary information by data-embedding, we can enhance pattern recognition performance up to 100%.
Song GAO Chunheng WANG Baihua XIAO Cunzhao SHI Wen ZHOU Zhong ZHANG
In this paper, we propose a representation method based on local spatial strokes for scene character recognition. High-level semantic information, namely co-occurrence of several strokes is incorporated by learning a sparse dictionary, which can further restrain noise brought by single stroke detectors. The encouraging results outperform state-of-the-art algorithms.
Song GAO Chunheng WANG Baihua XIAO Cunzhao SHI Wen ZHOU Zhong ZHANG
This paper tries to model spatial layout beyond the traditional spatial pyramid (SP) in the coding/pooling scheme for scene text character recognition. Specifically, we propose a novel method to build a dictionary called spatiality embedded dictionary (SED) in which each codeword represents a particular character stroke and is associated with a local response region. The promising results outperform other state-of-the-art algorithms.
Cheng CHENG Bilan ZHU Masaki NAKAGAWA
This paper presents an approach based on character recognition to searching for keywords in on-line handwritten Japanese text. It employs an on-line character classifier and an off-line classifier or a combined classifier, which produce recognition candidates, and it searches for keywords in the lattice of candidates. It integrates scores to individually recognize characters and their geometric context. We use quadratic discriminant function(QDF) or support vector machines(SVM) models to evaluate the geometric features of individual characters and the relationships between characters. This paper also presents an approach based on feature matching that employs on-line or off-line features. We evaluate three recognition-based methods, two feature-matching-based methods, as well as ideal cases of the latter and concluded that the approach based on character recognition outperformed that based on feature matching.
Xue GAO Jinzhi GUO Lianwen JIN
Linear Discriminant Analysis (LDA) is one of the most popular dimensionality reduction techniques in existing handwritten Chinese character (HCC) recognition systems. However, when used for unconstrained handwritten Chinese character recognition, the traditional LDA algorithm is prone to two problems, namely, the class separation problem and multimodal sample distributions. To deal with these problems,we propose a new locally linear discriminant analysis (LLDA) method for handwritten Chinese character recognition.Our algorithm operates as follows. (1) Using the clustering algorithm, find clusters for the samples of each class. (2) Find the nearest neighboring clusters from the remaining classes for each cluster of one class. Then, use the corresponding cluster means to compute the between-class scatter matrix in LDA while keeping the within-class scatter matrix unchanged. (3) Finally, apply feature vector normalization to further improve the class separation problem. A series of experiments on both the HCL2000 and CASIA Chinese character handwriting databases show that our method can effectively improve recognition performance, with a reduction in error rate of 28.7% (HCL2000) and 16.7% (CASIA) compared with the traditional LDA method.Our algorithm also outperforms DLA (Discriminative Locality Alignment,one of the representative manifold learning-based dimensionality reduction algorithms proposed recently). Large-set handwritten Chinese character recognition experiments also verified the effectiveness of our proposed approach.
Latsamy SAYSOURINHONG Bilan ZHU Masaki NAKAGAWA
This paper describes on-line recognition of handwritten Lao characters by adopting Markov random field (MRF). The character set to recognize includes consonants, vowels and tone marks, 52 characters in total. It extracts feature points along the pen-tip trace from pen-down to pen-up, and then sets each feature point from an input pattern as a site and each state from a character class as a label. It recognizes an input pattern by using a linear-chain MRF model to assign labels to the sites of the input pattern. It employs the coordinates of feature points as unary features and the transitions of the coordinates between the neighboring feature points as binary features. An evaluation on the Lao character pattern database demonstrates the robustness of our proposed method with recognition rate of 92.41% and respectable recognition time of less than a second per character.
Masako OMACHI Shinichiro OMACHI
Precise estimation of data distribution with a small number of sample patterns is an important and challenging problem in the field of statistical pattern recognition. In this paper, we propose a novel method for estimating multimodal data distribution based on the Gaussian mixture model. In the proposed method, multiple random vectors are generated after classifying the elements of the feature vector into subsets so that there is no correlation between any pair of subsets. The Gaussian mixture model for each subset is then constructed independently. As a result, the constructed model is represented as the product of the Gaussian mixture models of marginal distributions. To make the classification of the elements effective, a graph cut technique is used for rearranging the elements of the feature vectors to gather elements with a high correlation into the same subset. The proposed method is applied to a character recognition problem that requires high-dimensional feature vectors. Experiments with a public handwritten digit database show that the proposed method improves the accuracy of classification. In addition, the effect of classifying the elements of the feature vectors is shown by visualizing the distribution.
This paper describes a method of producing segmentation point candidates for on-line handwritten Japanese text by a support vector machine (SVM) to improve text recognition. This method extracts multi-dimensional features from on-line strokes of handwritten text and applies the SVM to the extracted features to produces segmentation point candidates. We incorporate the method into the segmentation by recognition scheme based on a stochastic model which evaluates the likelihood composed of character pattern structure, character segmentation, character recognition and context to finally determine segmentation points and recognize handwritten Japanese text. This paper also shows the details of generating segmentation point candidates in order to achieve high discrimination rate by finding the optimal combination of the segmentation threshold and the concatenation threshold. We compare the method for segmentation by the SVM with that by a neural network (NN) using the database HANDS-Kondate_t_bf-2001-11 and show the result that the method by the SVM bring about a better segmentation rate and character recognition rate.
Shinichiro OMACHI Shunichi MEGAWA Hirotomo ASO
A practical optical character reader is required to deal with not only common fonts but also complex designed fonts. However, recognizing various kinds of decorative character images is still a challenging problem in the field of document image analysis. Since appearances of such decorative characters are complicated, most general character recognition systems cannot give good performances on decorative characters. In this paper, an algorithm that recognizes decorative characters by structural analysis using a graph-matching technique is proposed. Character structure is extracted by using topographical features of multi-scale images, and the extracted structure is represented by a graph. A character image is recognized by matching graphs of the input and standard patterns. Experimental results show the effectiveness of the proposed algorithm.
We propose a learning method combining query learning and a "genetic translator" we previously developed. Query learning is a useful technique for high-accuracy, high-speed learning and reduction of training sample size. However, it has not been applied to practical optical character readers (OCRs) because human beings cannot recognize queries as character images in the feature space used in practical OCR devices. We previously proposed a character image reconstruction method using a genetic algorithm. This method is applied as a "translator" from feature space for query learning of character recognition. The results of an experiment with hand-written numeral recognition show the possibility of training sample size reduction.
Masaki NAKAGAWA Bilan ZHU Motoki ONUMA
This paper presents a model and its effect for on-line handwritten Japanese text recognition free from line-direction constraint and writing format constraint such as character writing boxes or ruled lines. The model evaluates the likelihood composed of character segmentation, character recognition, character pattern structure and context. The likelihood of character pattern structure considers the plausible height, width and inner gaps within a character pattern that appear in Chinese characters composed of multiple radicals (subpatterns). The recognition system incorporating this model separates freely written text into text line elements, estimates the average character size of each element, hypothetically segments it into characters using geometric features, applies character recognition to segmented patterns and employs the model to search the text interpretation that maximizes likelihood as Japanese text. We show the effectiveness of the model through recognition experiments and clarify how the newly modeled factors in the likelihood affect the overall recognition rate.
Motoki ONUMA Akihito KITADAI Bilan ZHU Masaki NAKAGAWA
This paper describes an on-line handwritten Japanese text recognition system that is liberated from constraints on line direction and character orientation. The recognition system first separates freely written text into text line elements, second estimates the line direction and character orientation using the time sequence information of pen-tip coordinates, third hypothetically segment it into characters using geometric features and apply character recognition. The final step is to select the most plausible interpretation by evaluating the likelihood composed of character segmentation, character recognition, character pattern structure and context. The method can cope with a mixture of vertical, horizontal and skewed text lines with arbitrary character orientations. It is expected useful for tablet PC's, interactive electronic whiteboards and so on.