Current Status and Future Prospects of Camera-Based Character Recognition and Document Image Analysis

Σχετικά έγγραφα
Robust Feature Extraction Method Based on Run-Length Compensation for Degraded Character Recognition

Area Location and Recognition of Video Text Based on Depth Learning Method

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Identifying Scenes with the Same Person in Video Content on the Basis of Scene Continuity and Face Similarity Measurement

Anomaly Detection with Neighborhood Preservation Principle

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Detection and Recognition of Traffic Signal Using Machine Learning

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Adaptive grouping difference variation wolf pack algorithm

{takasu, Conditional Random Field

Buried Markov Model Pairwise

[1] DNA ATM [2] c 2013 Information Processing Society of Japan. Gait motion descriptors. Osaka University 2. Drexel University a)

Gaze Estimation from Low Resolution Images Insensitive to Segmentation Error

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Automatic extraction of bibliography with machine learning

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)


Method to Distinguish between Handwritten and Machine-printed Characters Inspired by Human Vision System

ΕΡΕΥΝΗΤΙΚΑ ΠΡΟΓΡΑΜΜΑΤΑ ΑΡΧΙΜΗΔΗΣ ΕΝΙΣΧΥΣΗ ΕΡΕΥΝΗΤΙΚΩΝ ΟΜΑΔΩΝ ΣΤΟ ΤΕΙ ΣΕΡΡΩΝ. Ενέργεια στ ΘΕΜΑ ΕΡΕΥΝΑΣ: ΔΙΑΡΘΡΩΣΗ ΠΕΡΙΕΧΟΜΕΝΟΥ ΕΧΡΩΜΩΝ ΕΓΓΡΑΦΩΝ

CSJ. Speaker clustering based on non-negative matrix factorization using i-vector-based speaker similarity

Physical and Chemical Properties of the Nest-site Beach of the Horseshoe Crab Rehabilitated by Sand Placement

ER-Tree (Extended R*-Tree)

Resurvey of Possible Seismic Fissures in the Old-Edo River in Tokyo

Scrub Nurse Robot: SNR. C++ SNR Uppaal TA SNR SNR. Vain SNR. Uppaal TA. TA state Uppaal TA location. Uppaal

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

ΦΩΤΟΓΡΑΜΜΕΤΡΙΚΕΣ ΚΑΙ ΤΗΛΕΠΙΣΚΟΠΙΚΕΣ ΜΕΘΟΔΟΙ ΣΤΗ ΜΕΛΕΤΗ ΘΕΜΑΤΩΝ ΔΑΣΙΚΟΥ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Quick algorithm f or computing core attribute

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Εφαρμογή Υπολογιστικών Τεχνικών στην Γεωργία

Reading Order Detection for Text Layout Excluded by Image

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Architecture for Visualization Using Teacher Information based on SOM

ΣΤΟΙΧΕΙΑ ΠΡΟΤΕΙΝΟΜΕΝΟΥ ΕΞΩΤΕΡΙΚΟΥ ΕΜΠΕΙΡΟΓΝΩΜΟΝΟΣ Προσωπικά Στοιχεία:

Mapping Textures on 3D Geometric Model Using Reflectance Image

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

ΓΙΑΝΝΟΥΛΑ Σ. ΦΛΩΡΟΥ Ι ΑΚΤΟΡΑΣ ΤΟΥ ΤΜΗΜΑΤΟΣ ΕΦΑΡΜΟΣΜΕΝΗΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΤΟΥ ΠΑΝΕΠΙΣΤΗΜΙΟΥ ΜΑΚΕ ΟΝΙΑΣ ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ

Πτυχιακή Εργασι α «Εκτι μήσή τής ποιο τήτας εικο νων με τήν χρή σή τεχνήτων νευρωνικων δικτυ ων»

High order interpolation function for surface contact problem

Arbitrage Analysis of Futures Market with Frictions

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

Development of the Nursing Program for Rehabilitation of Woman Diagnosed with Breast Cancer

Διπλωματική Εργασία του φοιτητή του Τμήματος Ηλεκτρολόγων Μηχανικών και Τεχνολογίας Υπολογιστών της Πολυτεχνικής Σχολής του Πανεπιστημίου Πατρών

, Evaluation of a library against injection attacks

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

«ΑΝΑΠΣΤΞΖ ΓΠ ΚΑΗ ΥΩΡΗΚΖ ΑΝΑΛΤΖ ΜΔΣΔΩΡΟΛΟΓΗΚΩΝ ΓΔΓΟΜΔΝΩΝ ΣΟΝ ΔΛΛΑΓΗΚΟ ΥΩΡΟ»

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΛΕΩΝΙΔΑΣ Α. ΣΠΥΡΟΥ Διδακτορικό σε Υπολογιστική Εμβιομηχανική, Τμήμα Μηχανολόγων Μηχανικών, Πανεπιστήμιο Θεσσαλίας.

ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΚΡΗΤΗΣ ΣΧΟΛΗ ΔΙΟΙΚΗΣΗΣ ΚΑΙ ΟΙΚΟΝΟΜΙΑΣ (ΣΔΟ) ΤΜΗΜΑ ΛΟΓΙΣΤΙΚΗΣ ΚΑΙ ΧΡΗΜΑΤΟΟΙΚΟΝΟΜΙΚΗΣ

ΠΑΡΑΔΟΤΕΟ 3.1 : Έκθεση καταγραφής χρήσεων γης

Toward the Quantitative Study of Hydrothermal Systems An Approach to Understand Hydrothermal Systems

Ανάλυση Προτιμήσεων για τη Χρήση Συστήματος Κοινόχρηστων Ποδηλάτων στην Αθήνα

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Probabilistic Approach to Robust Optimization

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

BCI On Feature Extraction from Multi-Channel Brain Waves Used for Brain Computer Interface

ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. «Προστασία ηλεκτροδίων γείωσης από τη διάβρωση»

Development of a basic motion analysis system using a sensor KINECT

Simplex Crossover for Real-coded Genetic Algolithms

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής


An Angulation Method for Active RFID Tag using Covariance with Known Tags

Approximation Expressions for the Temperature Integral

Evaluation of Methods to Extract Important Scenes for Automatic Digest Generation from a Presentation Video

Research on model of early2warning of enterprise crisis based on entropy

Development of a Tiltmeter with a XY Magnetic Detector (Part +)

Free-viewpoint Video Rendering in Large Outdoor Space such as Soccer Stadium based on Object Extraction and Tracking Technology

Ψηφιακό Μουσείο Ελληνικής Προφορικής Ιστορίας: πώς ένας βιωματικός θησαυρός γίνεται ερευνητικό και εκπαιδευτικό εργαλείο στα χέρια μαθητών

ΣΤΗ ΣΧΕ ΙΑΣΗ ΕΚΠΑΙ ΕΥΤΙΚΟΥ ΛΟΓΙΣΜΙΚΟΥ 1

ΕΡΕΥΝΗΤΙΚΑ ΠΡΟΓΡΑΜΜΑΤΑ ΑΡΧΙΜΗΔΗΣ ΕΝΙΣΧΥΣΗ ΕΡΕΥΝΗΤΙΚΩΝ ΟΜΑΔΩΝ ΣΤΟ ΤΕΙ ΣΕΡΡΩΝ. Ενέργεια στ ΘΕΜΑ ΕΡΕΥΝΑΣ: ΔΙΑΡΘΡΩΣΗ ΠΕΡΙΕΧΟΜΕΝΟΥ ΕΧΡΩΜΩΝ ΕΓΓΡΑΦΩΝ

Topic Structure Mining based on Wikipedia and Web Search

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

*,* + -+ on Bedrock Bath. Hideyuki O, Shoichi O, Takao O, Kumiko Y, Yoshinao K and Tsuneaki G

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

[2], [3], [8], [20] [4] [6], [18] [1], [11], [19] [13] [10] N SVD PCA N SVD Vasilescu Vasilescu N SVD [14] [17] Y Li [7] Y Li N SVD [12] 2,,,,, 596

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

ΖΩΝΟΠΟΙΗΣΗ ΤΗΣ ΚΑΤΟΛΙΣΘΗΤΙΚΗΣ ΕΠΙΚΙΝΔΥΝΟΤΗΤΑΣ ΣΤΟ ΟΡΟΣ ΠΗΛΙΟ ΜΕ ΤΗ ΣΥΜΒΟΛΗ ΔΕΔΟΜΕΝΩΝ ΣΥΜΒΟΛΟΜΕΤΡΙΑΣ ΜΟΝΙΜΩΝ ΣΚΕΔΑΣΤΩΝ

FPGA. Variations and BTI-induced Aging Degradation on Commercial FPGAs. Shouhei ISHII and Kazutoshi KOBAYASHI, 3 FPGA JST, CREST

BoVW. (Histogram Encoding) [2], [5], [6] [7], [8], (Fisher Encoding) [3] VLAD [9] Super Vector [10] Locality Constrained [11], [12], [13]

Η Διδακτική Ενότητα «Γνωρίζω τον Υπολογιστή», στα πλαίσια των Προγραμμάτων Σπουδών της Πληροφορικής: μια Μελέτη Περίπτωσης.

J. of Math. (PRC) 6 n (nt ) + n V = 0, (1.1) n t + div. div(n T ) = n τ (T L(x) T ), (1.2) n)xx (nt ) x + nv x = J 0, (1.4) n. 6 n

CorV CVAC. CorV TU317. 1

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Transcript:

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. 599 8531 1 1 980 8579 6 6 05 812 8581 6 10 1 E-mail: {kise,masa}@cs.osakafu-u.ac.jp, machi@aso.ecei.tohoku.ac.jp, uchida@is.kyushu-u.ac.jp Current Status and Future Prospects of Camera-Based Character Recognition and Document Image Analysis Koichi KISE, Shinichiro OMACHI, Seiichi UCHIDA, and Masakazu IWAMURA Graduate School of Engineering, Osaka Prefecture University Sakai-shi, Osaka, 599 8531 Japan Graduate School of Engineering, Tohoku University 6 6 05 Aoba, Aramaki, Aoba-ku, Sendai-shi, 980 8579 Japan Faculty of Information Science and Electrical Engineering, Kyushu University Hakozaki 6-10-1, Higashi-ku, Fukuoka-shi, 812-8581 Japan E-mail: {kise,masa}@cs.osakafu-u.ac.jp, machi@aso.ecei.tohoku.ac.jp, uchida@is.kyushu-u.ac.jp Abstract Pervasive use of handy digital cameras with higher resolution is now defining new roles of character recognition and document image analysis as a mean of analyzing camera-captured images. In this report, we survey state-of-the-art of research and technologies of camera based character recognition and document image analysis. We also describe the current position and future prospects of character recognition and document image analysis in comparison with related technologies such as barcodes. In addition, we briefly introduce our research entitled embedding information on characters using cross ratios whose final goal is to make character recognition as easy and accurate as bar-code reading. Key words Camera-based, Character recognition, Document image analysis, Embedding information, Cross ratio 1. 500 300 1

Doermann [1] [2], [3] [5] [4] 2. 3. RFID 4. 5. 2. 2. 1 1 [5] A4 300dpi 400dpi 900 1500 500 300 800 2 2. 2 2. 2. 1 2 2 [6] [7] [2] 2 2 [8], [9] [10] [11] [12] [13] 2 [14] 2.3 2 [1] 2. 2. 2 ( ) (i) (ii) dewarping 2.1(1) [15] [17] 2 ( 2 ) 2.1(2) 2

[18] [19] [20], [21] Shape-from-shading [22] 2. 2. 3 ( ) 2 [23] [24] [25] [26] [23] [26] [24] [9] 2. 3 2. 3. 1 [27] 2.2 [28] [29] [11], [12] [8] [30] [31] [32] [33] [17] 2. 3. 2 2.2 (2 ) [34] [35] [36] [17] [37] [9] DP [38] [39] 2. 4 ( ) 3

dewarping dewarping [15], [16] [40] [29] 3. [41], [42] RFID(Radio Frequency Identification) [43] ( [44]) 1 1 ID RFID 1 DataGlyph [42] [2], [3] 4. 2. 1 (defect models, degradation models) [45], [46] [35], [47], [48] 2 OCR OCR 5. 3. 4. 4

1, RFID,,, 1 guide l 1 l l 1 2 l 2 l 3 l 3 p (a) (b) (a) (b) (δ = 32) recognition accuracy (%) 100 90 80 70 60 50 40 30 20 shape similarity cross ratio 0 8 16 24 32 40 48 perspective distortion, δ OCR 5. 1 4 1 ( ) 1(a) (b) 3 l 1,l 2,l 3 r =(l 1 + l 2)(l 2 + l 3)/(l 2(l 1 + l 2 + l 3)) l 1,l 2,l 3 [49] ( 1(b) p) ( l 1,l 2,l 3) r URL 5. 2 (Arial)26 200 200 l 1 15 l 2 2 l 3 l 2 + l 3 = 135, 15 < = l 2, 15 < = l 3 l 2 l 3 26 [49] 24 26 4 x, y ±δ 256 1(b) 1( ) 0( ) 2 δ 2 δ 26 2 ( ) [4] 5

6. [1] D. Doermann, J. Liang and H. Li: Progress in camerabased document image analysis, Proc. ICDAR 03, pp. 606 616 (2003). [2],, PRMU2003-229 (2004). [3],,, 45, 9, pp. 923 927 (2004). [4],,, 100%, (2005.3) [ ]. [5],, (1999). [6] O. D. Trier and T. Taxt: Evaluation of binarization methods for document images, IEEE Trans. PAMI, 17, 3, pp. 312 315 (1995). [7],,,,, (D-II), J87-D-II, 2, pp. 761 766 (2004). [8],, (D), J71-D, 6, pp. 1037 1047 (1988). [9],,,, PRMU2004-124 (2004). [10] S. Messelodi and C. M. Modena: Automatic identification and skew estimation of text linesin real scene images, Pattern Recognition, 32, pp. 791 810 (1999). [11],,, (D-II), J80-D-II, 6, pp. 1617 1626 (1997). [12] K. Wang and J. A. Kangas: Character location in scene images from digital camera, Pattern Recognition, 36, 10, pp. 2287 2299 (2003). [13],,,,,, PRMU2003-222 (2004). [14],,, (2005.3) [ ]. [15] M. Pilu: Extraction of illusory linear clues in perspectively skewed documents, Proc. CVPR 01 (2001). [16] P. Clark and M. Mirmehdi: Recognising text in real scenes, IJDAR, 4, pp. 243 257 (2002). [17] X. Chen, J. Yang and A. Waibel: Automatic detection and recognitionof signs from natural scenes, IEEE Trans. Image Processing, 13, 1, pp. 87 99 (2004). [18] M. S. Brown and W. B. Seales: Image restoration ofarbitrarily warped documents, IEEE Trans. PAMI, 26, 10, pp. 1295 1306 (2004). [19] C. Wu and G. Agam: Document image de-warping for text/graphics recognition, Lecture Notes in Computer Science (Joint IAPR International Workshops SSPR 2002 and SPR 2002), Vol. 2396, pp. 348 357 (2002). [20] H. Cao, X. Ding and C. Liu: Rectifying the bound document image captured by the camera: a model based approach, Proc. ICDAR 03, pp. 71 75 (2003). [21] T. Kanungo, R. M. Haralick and I. Phillips: Global and local document degradation models, Proc. ICDAR 93, pp. 730 734 (1993). [22],, 3 i shape from shading, (D-II), J77-D-II, 6, pp. 1059 1067 (1994). [23] H. Li and D. Doermann: Text enhancement in digital video using multiple frame integration, Proc. ACM Multimedia, pp. 19 22 (1999). [24],,,,,,,, PRMU2003-223 (2004). [25],,,,,, PRMU97-221 (1998). [26] A. Zappalà, A. Gee and M. Taylor: Document mosaicing, Image and Vision Computing, 17, 8, pp. 589 595 (1999). [27] K. Yamada, K. Ishikawa and N. Nakajima: A method of analyzing the handling of paper documents in motion images, Proc. ICPR 00, Vol. 4, pp. 413 416 (2000). [28],,,,,, (D-II), J81-D-II, 10, pp. 2280 2287 (1998). [29],,,,,, 31, 4,pp. 542 552 (2002). [30],,,, PRMU2002-217 (2003). [31],,,, (D-II), J81-D-II, 4, pp. 641 650 (1998). [32], Log, (D-II), J87-D-II, 8, pp. 1735 1739 (2004). [33] X. Chen and A. L. Yuille: Detecting and reading text in natural scenes, Proc. CVPR 04 (2004). [34],,, PRMU2001-275 (2002). [35],,,, 54, 6, pp. 881 886 (2000). [36],,,, MIRU2004, 1, pp. 321 324 (2004). [37] Y. Kusachi, A. Suzuki, N. Ito and K. Arakawa: Kanji recognition in scene images without detection of text fieldsrobust against variation of viewpoint, contrast, and background texture -, Proc. ICPR 04 (2004). [38],,, (2005.3) [ ]. [39] S. Omachi, M. Inoue and H. Aso: Structure extraction from decorated characters using multiscale images, IEEE Trans. PAMI, 23, 3, pp. 315 322 (2001). [40],,,, (2005.3) [ ]. [41] http://www.adams1.com/pub/russadam/stack.html. [42] D. L. Hecht: Printed embedded data graphical user interfaces, IEEE Computer, 34, 3, pp. 47 55 (2001). [43],, 45, 1 (2005). [44],,,, (D-II), J87-D-II, 12, pp. 2145 2155 (2004). [45] H. S. Baird: The state of the art of document image degradation modeling, Proc. 4th IAPR International Workshop on Document Analysis Systems, pp. 1 16 (2000). [46] Y. Li, D. Lopresti, G. Nagy and A. Tomkins: Validation of image defect models for optical character recognition, IEEE Trans. PAMI, 18, 2, pp. 99 108 (1996). [47],,,,,, PRMU2004-7 (2004). [48] J. Sun, Y. Hotta, Y. Katsuyama and S. Naoi: Low resolution character recognition by dual eigenspace and synthetic degraded patterns, Proc. 1st ACM workshop on Hardcopy document processing, pp. 15 22 (2004). [49],, :,, 99-CVIM-115-13 (1999). 6