Japanese Fuzzy String Matching in Cooking Recipes

Σχετικά έγγραφα
Για να εμφανιστούν σωστά οι χαρακτήρες της Γραμμικής Β, πρέπει να κάνετε download και install τα fonts της Linear B που υπάρχουν στο τμήμα Downloads.

SUPPLEMENTAL INFORMATION. Fully Automated Total Metals and Chromium Speciation Single Platform Introduction System for ICP-MS

{takasu, Conditional Random Field

Το άτομο του Υδρογόνου

Buried Markov Model Pairwise

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

Applying Markov Decision Processes to Role-playing Game

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

Wiki. Wiki. Analysis of user activity of closed Wiki used by small groups

ΓΗ ΚΑΙ ΣΥΜΠΑΝ. Εικόνα 1. Φωτογραφία του γαλαξία μας (από αρχείο της NASA)

The Study of Evolutionary Change of Shogi

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

, Evaluation of a library against injection attacks

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

ER-Tree (Extended R*-Tree)

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

Appendix to On the stability of a compressible axisymmetric rotating flow in a pipe. By Z. Rusak & J. H. Lee

ΑΠΟΔΟΤΙΚΗ ΑΠΟΤΙΜΗΣΗ ΕΡΩΤΗΣΕΩΝ OLAP Η ΜΕΤΑΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ ΕΞΕΙΔΙΚΕΥΣΗΣ. Υποβάλλεται στην

bab.la Φράσεις: Ταξίδι Τρώγοντας έξω ελληνικά-ελληνικά

Maxima SCORM. Algebraic Manipulations and Visualizing Graphs in SCORM contents by Maxima and Mashup Approach. Jia Yunpeng, 1 Takayuki Nagai, 2, 1

Meren virsi Eino Leino

Study on Re-adhesion control by monitoring excessive angular momentum in electric railway traction

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

Quick algorithm f or computing core attribute

Homework 3 Solutions

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

ΣΔΥΝΟΛΟΓΗΚΟ ΔΚΠΑΗΓΔΤΣΗΚΟ ΗΓΡΤΜΑ ΗΟΝΗΧΝ ΝΖΧΝ «ΗΣΟΔΛΗΓΔ ΠΟΛΗΣΗΚΖ ΔΠΗΚΟΗΝΧΝΗΑ:ΜΔΛΔΣΖ ΚΑΣΑΚΔΤΖ ΔΡΓΑΛΔΗΟΤ ΑΞΗΟΛΟΓΖΖ» ΠΣΤΥΗΑΚΖ ΔΡΓΑΗΑ ΔΤΑΓΓΔΛΗΑ ΣΔΓΟΤ

Elements of Information Theory

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/26 1,a) Music Structure and Composition with Sound Directivity in 3D Space

Sarò signor io sol. α α. œ œ. œ œ œ œ µ œ œ. > Bass 2. Domenico Micheli. Canzon, ottava stanza. Soprano 1. Soprano 2. Alto 1

Ανάκτηση Πληροφορίας

Bundle Adjustment for 3-D Reconstruction: Implementation and Evaluation

C F E E E F FF E F B F F A EA C AEC

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Lecture 21: Scattering and FGR

Παραμύθια τησ Χαλιμϊσ, τομ. A Σελύδα 1

Αλληλεπίδραση ακτίνων-χ με την ύλη

Η χρήση του MOODLE από την οπτική γωνία του ιαχειριστή

Νόµοςπεριοδικότητας του Moseley:Η χηµική συµπεριφορά (οι ιδιότητες) των στοιχείων είναι περιοδική συνάρτηση του ατοµικού τους αριθµού.

Conductivity Logging for Thermal Spring Well

ΠΕΡΙΟΔΙΚΟΣ ΠΙΝΑΚΑΣ ΣΤΟΙΧΕΙΩΝ

HIV HIV HIV HIV AIDS 3 :.1 /-,**1 +332

Οδηγός συγγραφής βιβλιογραφίας με τη χρήση του Βιβλιογραφικού προτύπου NLM (National Library of Medicine) 1

S :Silicon Sealed blank:no Silicon Seal 2.54/0.100" 5.05/0.199" Code 3.2/0.126" 5.7/0.224" 3.2/0.126" 7.4/0.291"

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Ανάκτηση Εικόνας βάσει Υφής με χρήση Eye Tracker

Development of a basic motion analysis system using a sensor KINECT

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

Automatic extraction of bibliography with machine learning

Approximation of distance between locations on earth given by latitude and longitude

ΜΕΛΕΤΗ ΤΗΣ ΥΝΑΤΟΤΗΤΑΣ ΑΞΙΟΠΟΙΗΣΗΣ ΤΟΥ ΓΕΩΘΕΡΜΙΚΟΥ ΠΕ ΙΟΥ ΘΕΡΜΩΝ ΝΙΓΡΙΤΑΣ (Ν. ΣΕΡΡΩΝ)

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Speeding up the Detection of Scale-Space Extrema in SIFT Based on the Complex First Order System

Analysis of prosodic features in native and non-native Japanese using generation process model of fundamental frequency contours

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Section 7.6 Double and Half Angle Formulas

Detection and Recognition of Traffic Signal Using Machine Learning

Example Sheet 3 Solutions

ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ. ΘΕΜΑ: «ιερεύνηση της σχέσης µεταξύ φωνηµικής επίγνωσης και ορθογραφικής δεξιότητας σε παιδιά προσχολικής ηλικίας»

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

A Sequential Experimental Design based on Bayesian Statistics for Online Automatic Tuning. Reiji SUDA,

Πανεπιστήµιο Πειραιώς Τµήµα Πληροφορικής

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Quantitative chemical analyses of rocks with X-ray fluorescence analyzer: major and trace elements in ultrabasic rocks

Introduction to Bioinformatics

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Quantifying the Financial Benefits of Chemical Inventory Management Using CISPro

Estimation, Evaluation and Guarantee of the Reverberant Speech Recognition Performance based on Room Acoustic Parameters

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Mock Exam 7. 1 Hong Kong Educational Publishing Company. Section A 1. Reference: HKDSE Math M Q2 (a) (1 + kx) n 1M + 1A = (1) =

E62-TAB AC Series Features

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

ΓΕΩΠΟΝΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙO ΑΘΗΝΩΝ ΤΜΗΜΑ ΑΞΙΟΠΟΙΗΣΗΣ ΦΥΣΙΚΩΝ ΠΟΡΩΝ & ΓΕΩΡΓΙΚΗΣ ΜΗΧΑΝΙΚΗΣ

Development of a Seismic Data Analysis System for a Short-term Training for Researchers from Developing Countries

Η Διαδραστική Τηλεδιάσκεψη στο Σύγχρονο Σχολείο: Πλαίσιο Διδακτικού Σχεδιασμού

Ψηφιακό Μουσείο Ελληνικής Προφορικής Ιστορίας: πώς ένας βιωματικός θησαυρός γίνεται ερευνητικό και εκπαιδευτικό εργαλείο στα χέρια μαθητών

RMT Tick RMT. Application of the RMT-test on Real Data: Hash Function and Tick Data of Stock Prices

Chapter 22 - Heat Engines, Entropy, and the Second Law of Thermodynamics

Web-based supplementary materials for Bayesian Quantile Regression for Ordinal Longitudinal Data

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Si + Al Mg Fe + Mn +Ni Ca rim Ca p.f.u

Estimation of grain boundary segregation enthalpy and its role in stable nanocrystalline alloy design

ss rt çã r s t Pr r Pós r çã ê t çã st t t ê s 1 t s r s r s r s r q s t r r t çã r str ê t çã r t r r r t r s

SCITECH Volume 13, Issue 2 RESEARCH ORGANISATION Published online: March 29, 2018


ΑΛΛΗΛΕΠΙ ΡΑΣΗ ΜΟΡΦΩΝ ΛΥΓΙΣΜΟΥ ΣΤΙΣ ΜΕΤΑΛΛΙΚΕΣ ΚΑΤΑΣΚΕΥΕΣ

ELIXIR-GR / BiP! Finder

ΠΕΡΙΟΔΙΚΟ ΣΥΣΤΗΜΑ ΤΩΝ ΣΤΟΙΧΕΙΩΝ (1) Ηλία Σκαλτσά ΠΕ ο Γυμνάσιο Αγ. Παρασκευής

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

ICT use and literature courses in secondary Education: possibilities and limitations

Ι ΙΟΤΗΤΕΣ ΤΩΝ ΑΤΟΜΩΝ. Παππάς Χρήστος Επίκουρος Καθηγητής

Οδηγός συγγραφής βιβλιογραφίας με τη χρήση του Βιβλιογραφικού προτύπου Harvard British Standard

Math221: HW# 1 solutions

ΝΟΜΟΣ ΤΗΣ ΠΕΡΙΟ ΙΚΟΤΗΤΑΣ : Οι ιδιότητες των χηµικών στοιχείων είναι περιοδική συνάρτηση του ατοµικού τους αριθµού.

[2] T.S.G. Peiris and R.O. Thattil, An Alternative Model to Estimate Solar Radiation

Ανάκτηση Πληροφορίας

Transcript:

1 Japanese Fuzzy String Matching in Cooking Recipes Michiko Yasukawa 1 In this paper, we propose Japanese fuzzy string matching in cooking recipes. Cooking recipes contain spelling variants for recipe titles and ingredient names that cause mismatches between search queries and relevant recipe texts. In order to find these spelling variants, we use phonetic matching in Japanese and edit distance. We have evaluated the proposed methods using actual cooking recipes on the Internet. We report our findings based on the evaluation results. 1. (indexing term) 1) 2) 3) 1 Gunma University 4) 5) 6) 7) 8) 9) 10) 11) 2. 2.1 (phonetic matching) Soundex 12) Metaphone 13) 1 14) Soundex Metaphone Soundex 6 (123456) Metaphone 16 (0BFHJKLMNPRSTWXY) 1 c 2012 Information Processing Society of Japan

1 Editex Table 1 Editex Letter Groups. 0 1 2 3 4 5 6 7 8 9 aeiouy bp ckq dt lr mn gj fpv sxz csz SMITH SMYTH Soundex S530 Metaphone SMITH SMYTH SM0 2.2 (approximate string matching) (edit distance) 15) x y d(x, y) x y 2 2 1 Zobel 16) Editex Editex 1 2 d(sip,zip) d(sip,lip) 2 Editex 1 2 d(sip,zip) 1 d(sip,lip) 2 3. 3.1 17) ( 2) 2 Table 2 2 The Japanese Syllabary (Fifty Sounds). Hiragana Symbol Katakana Symbol A I U E O A I U E O ϕ 1 E38182 E38184 E38186 E38188 E3818A E382A2 E382A4 E382A6 E382A8 E382AA a i u e o a i u e o K 2 E3818B E3818D E3818F E38191 E38193 E382AB E382A4 E382A6 E382A8 E382AA ka ki ku ke ko ka ki ku ke ko S 3 E38195 E38197 E38199 E3819B E3819D E382B5 E382B7 E382B9 E382BB E382BD sa si su se so sa si su se so T 4 E3819F E381A1 E381A4 E381A6 E381A8 E382BF E38381 E38384 E38386 E38388 ta ti tu te to ta ti tu te to N 5 E381AA E381AB E381AC E381AD E381AE E3838A E3838B E3838C E38838D E3838E na ni nu ne no na ni nu ne no H 6 E381AF E381B2 E381B5 E381B8 E381BB E3838F E388392 E38395 E38398 E3839B ha hi hu he ho ha hi hu he ho M 7 E381BE E381BF E38280 E38281 E38282 E3839E E3839F E383A0 E383A1 E383A2 ma mi mu me mo ma mi mu me mo Y 8 E38284 E38286 E38288 E383A4 E383A6 E383A8 ya yu yo ya yu yo R 9 E38289 E3828A E3828B E3828C E3828D E383A9 E383AA E383AB E383AC E383AD ra ri ru re ro ra ri ru re ro W 10 E3828F E38290 E38291 E38292 E383AF E383B0 E383B1 E383B2 wa wi we wo wa wi we wo 1 2 3 4 5 1 2 3 4 5 2 c 2012 Information Processing Society of Japan

3 4 5 6 4 4 (jppm1 jppm2 jppm3 jppm4) 3 4 5 6 2 (DF ) jppm1 jppm2 jppm3 jppm1 jppm1 jppm4 jppm2 jppm2 jppm1 jppm4 jppm2 7 jppm2 3.2 3 (jppm1) Table 3 Encoding Table for Japanese Phonetic Matching (jppm1). Fifty Sounds [in] Code [out] Voiced Sounds [in] Code [out] Additional Symbols [in] Code [out] (ϕ) E38182 (lower-case, ϕ) E38182 (obs., ϕ) (macron, ϕ) (K) E3818B (G) E3818C (lower-case, K) E3818B (S) E38195 (Z) E38196 (obs., Z) (T) E3819F (D) E381A0 (lower-case, T) E381A3 (N) E381AA (syllabic nasal, N) E38293 (H) E381AF (B) E381B0 (V) (M) E381BE (P) E381B1 (Y) E38284 (lower-case, Y) E38283 (R) E38289 (W) E3828F (lower-case, W) E3828F Editex 1 0 9 Editex jpeditex 8 11 2 2 2 1 jpedit jpeditex jpedit jpeditex 9 jpedit 3 c 2012 Information Processing Society of Japan

4 (jppm2) Table 4 Encoding Table for Japanese Phonetic Matching (jppm2). Fifty Sounds [in] Code [out] Voiced Sounds [in] Code [out] Additional Symbols [in] Code [out] (ϕ) (lower-case, ϕ) (obs., ϕ) (macron, ϕ) (K) E3818B (G) E3818C (lower-case, K) (S) E38195 (Z) E38196 (obs., Z) (T) E3819F (D) E381A0 (lower-case, T) (N) E381AA (syllabic nasal, N) (H) E381AF (B) E381B0 (V) (M) E381BE (P) E381B1 (Y) E38284 (lower-case, Y) (R) E38289 (W) E3828F (lower-case, W) 6 (jppm4) Table 6 Encoding Table for Japanese Phonetic Matching (jppm4). Fifty Sounds [in] Code [out] Voiced Sounds [in] Code [out] Additional Symbols [in] Code [out] (ϕ) E38182 (lower-case, ϕ) (obs., ϕ) (macron, ϕ) (K) E3818B (G) E3818C (lower-case, K) E3818B (S) E38195 (Z) E38196 (obs., Z) (T) E3819F (D) E381A0 (lower-case, T) (N) E381AA (syllabic nasal, N) E38293 (H) E381AF (B) E381B0 (V) (M) E381BE (P) E381B1 (Y) E38284 (lower-case, Y) (R) E38289 (W) E3828F (lower-case, W) E3828F 7 Table 7 Example of Spelling Variant Sets using Phonetic Matching. 5 (jppm3) Table 5 Encoding Table for Japanese Phonetic Matching (jppm3). Fifty Sounds [in] Code [out] Voiced Sounds [in] Code [out] Additional Symbols [in] Code [out] (ϕ) E38182 (lower-case, ϕ) E38182 (obs., ϕ) (macron, ϕ) (K) E3818B (G) E3818B (lower-case, K) E3818B (S) E38195 (Z) E38195 (obs., Z) (T) E3819F (D) E3819F (lower-case, T) E3819F (N) E381AA (syllabic nasal, N) E381AA (H) E381AF (B) E381AF (V) (M) E381BE (P) E381AF (Y) E38284 (lower-case, Y) E38284 (R) E38289 (W) E3828F (lower-case, W) E3828F DF jppm1 jppm2 jppm3 jppm4 12 1 1 1 33 16 2 1 4. 4.1 2 (Dataset-A Dataset-B) Dataset-A 3 1 Web 1 http://www.ntv.co.jp/3min/ 4 c 2012 Information Processing Society of Japan

8 Editex Table 8 Encoding Table for Japanese Editex. Fifty Sounds [in] Code [out] Voiced Sounds [in] Code [out] Additional Symbols [in] Code [out] (ϕ) E38182 (lower-case, ϕ) E38182 (obs., ϕ) (macron, ϕ) (K) E3818B (G) E3818B (lower-case, K) E3818B (S) E38195 (Z) E38195 (obs., Z) (T) E3819F (D) E3819F (lower-case, T) E3819F (N) E381AA (syllabic nasal, N) E38293 (H) E381AF (B) E381AF (V) (P) (M) E381BE (Y) E38284 (lower-case, Y) E38284 (R) E38289 (W) E3828F (lower-case, W) E3828F 9 jpeditex Table 9 Example of Spelling Variant Sets using jpedit and jpeditex. (jpedit) (jpeditex) 1 0 1 0 2 2 2 1 3 2 3 2 4 4 4 3 5 4 5 3 1990 1 20 2012 7 11 5000 HTML Dataset-A (DF ) 10 13 (DF ) 1 Dataset-B COOKPAD 1 Web 1998 4 21 2010 7 17 80 Dataset-A HTML Dataset-B (DF ) 10 14 (DF ) 1 2 Dataset-A Dataset-B 4.2 3.1 (jppm1 jppm4) 10 Dataset-A jppm1 jppm3 jppm1 jppm3 1 jppm2 jppm4 jppm2 2 3 jppm1 jppm3 jppm2 jppm4 jppm2 jppm4 1 http://cookpad.com/ 2 5 c 2012 Information Processing Society of Japan

1 2 jppm1 jppm4 Dataset-B (transliteration) jppm1 jppm4 Dataset-A Dataset-A Dataset-B ( ) Dataset-B 3 jppm1 3 5 5 4 jppm2 3 1 2 12 9 7 jppm1 jppm3 jppm1 jppm1 3 jppm4 8 6 6 3 jppm4 jppm2 jppm2 jppm2 jppm4 jppm1 jppm3 jppm3 jppm2 jppm4 jppm2 jppm2 jppm4 jppm2 jppm2 jppm4 6 c 2012 Information Processing Society of Japan

10 Table 10 Number of Spelling Variant Sets using Phonetic Matching. jppm1 jppm2 jppm3 jppm4 Dataset-A ( ) 1 6 1 2 Dataset-A ( ) 7 2 11 16 Dataset-B ( ) 335 1695 580 952 Dataset-B ( ) 781 1845 1173 1270 jppm2 jppm1 jppm4 2 4.3 3.2 Dataset-B 4.2 jppm2 3 1 jpedit jpeditex 1 (DF ) 15 30 jpedit jpeditex jpedit jpeditex jppm2 jppm2 11 12 jppm2 jppm2 jpedit jpeditex jpedit jpeditex 11 1 1 2 4 12 jpedit jpeditex 5. 7 c 2012 Information Processing Society of Japan

11 Table 11 Spelling Variants of Recipe Titles with Edit/Editex. 12 Table 12 Spelling Variants of Ingredients with Edit/Editex. (jpedit) (jpeditex) 1 0 1 0 2 2 2 1 3 2 3 1 4 2 4 1 5 2 5 2 6 2 6 2 7 2 7 2 8 4 8 2 9 4 9 3 10 4 10 3 11 4 11 3 12 4 12 3 13 6 13 4 14 6 14 4 15 6 15 5 ( jppm2) ( : 21700273) (jpedit) (jpeditex) 1 0 1 0 2 2 2 1 3 2 3 1 4 2 4 1 5 2 5 1 6 2 6 2 7 2 7 2 8 2 8 2 9 2 9 2 10 2 10 2 11 2 11 2 12 2 12 2 13 2 13 2 14 2 14 2 15 2 15 2 16 2 16 2 17 2 17 2 18 2 18 2 19 4 19 2 20 4 20 2 21 4 21 2 22 4 22 2 23 4 23 3 24 4 24 3 25 4 25 3 26 4 26 3 27 4 27 3 28 4 28 3 29 4 29 3 30 4 30 3 1) 2. (< > ) Vol.93, No.1, pp.33 38 (2010). 2) 3. (< > ) Vol.93, No.1, pp.39 47 (2010). 3) 4. (< > ) Vol.93, No.1, pp.48 54 (2010). 4). D-II,, II- Vol.85, No.1, pp.79 89 (2002-01-01). 5) 8 c 2012 Information Processing Society of Japan

Vol.10, No.2, pp.3 17 (2003). 6). Vol.2004-NL-164, No.108, pp.117 122 (2004). 7) ( ) : Vol.22, pp.117 142 (2005). 8). HCI, Vol.2010, No.4, pp.1 7 (2010). 9) 18 pp.839 842 (2012). 10) ( 3: : (3)). HCI, Vol.2007, No.41, pp.51 57 (2007). 11) Vol.22, No.1B1-02, pp.1347 9881 (2009). 12) The U.S. National Archives and Records Administration: The Soundex Indexing System, (online), available from http://www.archives.gov/research/census/soundex.html (2007). 13) Philips, L.: The Double Metaphone Search Algorithm, C/C++ Users Journal, (online), available from http://drdobbs.com/cpp/184401251 (2000). 14) E. The Art of Computer Programming Volume 3 Sorting and Searching Second Edition pp.375 376, (2004). 15) (2009). 16) Zobel, J. and Dart, P. W.: Phonetic String Matching: Lessons from Information Retrieval, SIGIR 96 Proceedings, pp.166 172 (1996). 17) Yasukawa, M., Culpepper, J.S. and Scholer, F.: Phonetic Matching in Japanese, Proceedings of SIGIR 2012 Workshop on Open Source Information Retrieval (OSIR 2012), Portland, Oregon, USA., pp.68 71 (online), available from http://opensearchlab.otago.ac.nz/ (2012). 13 Dataset-A (DF ) 10 Table 13 Top 10 Titles/Ingredients in Dataset-A. (DF ) (DF ) 1 6 1 1882 2 7 2 1833 3 5 3 1486 4 5 4 1307 5 4 5 943 6 4 6 916 7 4 7 830 8 4 8 822 9 4 9 737 10 4 10 737 A.1 1 Dataset-A (DF ) Fig. 1 Document Freuqency (DF value) of Recipe Titles in Dataset-A. 9 c 2012 Information Processing Society of Japan

14 Dataset-B (DF ) 10 Table 14 Top 10 Titles/Ingredients in Dataset-B. (DF ) (DF ) 1 125 1 145661 2 123 2 136400 3 117 3 110129 4 117 4 100246 5 115 5 73506 6 107 6 71064 7 105 7 70784 8 105 8 58388 9 105 9 57340 10 102 10 49950 2 Dataset-B (DF ) Fig. 2 Document Freuqency (DF value) of Recipe Titles in Dataset-B. 10 c 2012 Information Processing Society of Japan