DEIM Forum 2016 C8-6 CRF 700 8530 3 1 1 700 8530 3 1 1 101 8430 2-1-2 E-mail: pobp52cw@s.okayama-u.ac.jp, ohta@de.cs.okayama-u.ac.jp, {takasu, adachi}@nii.ac.jp Conditional Random Field 1. Conditional Random Feild [1] Conditional Random Field (CRF) [2] CRF [3] 2 3 CRF 4 5 6 7 2. 1 1 [4]Okada [5]Peng [6] Councill [7] CRF [2] [4]Okada [5] Support Vector Machine(SVM) [8] Hidden Marlov Model(HMM) [9] pdftohtml 1 PDF Introduction Reference 1 SVM 1http://pdftohtml.sourceforge.net
HMM SVM HMM SVM 69.2% 74.8% 81.6% Okada vol. no. pp. ed. SVM HMM Vol.J83-DII No.1 No.12 97.6% Peng [6] Councill [7] HMM CRF Peng 935 500 435 13 F 0.939 500 350 150 13 F 0.915 Councill CRF ParsCit ParsCit Cora [10] 13 F 0.950 Ohta [11] Ohta CRF CRF Ohta OCR CRF Ohta [12] CRF 2 1 [1] Author RA Editor RE Translator RTR Author Other RAOT Title RT Booktitle RBT Journal RW Conference RC Volume RV Number RN Page RPP Publisher RP Day RD Month RM Year RY Location RL URL RURL Other ROT [1] 3. CRF 3. 1 2 1 [1] 1 Other 2 RA RT DC
2 D DC(+) [1] 3. 2 CRF CRF [2] CRF x = x 1,..., x n y = y 1,..., y n P (y x) = 1 n exp Z λ k f k (y i 1, y i, x) (1) x i=1 k Z x 1 n ( Z x = exp λ k f k y i 1, y i, x ) (2) y Y(x) i=1 k f k (y i 1, y i, x) (i 1) i x λ k f k Y (x) x x y y = arg max P(y x) (3) y Y(x) x i y i 3. 3 CRF++ 2 CRF++ [1] [3] 2 56 Unigram 1 Bigram 57 Unigram <dictionary(i)> 3 4 5 6 7 7 2http://taku910.github.io/crfpp/ 3http://www.census.gov/genealogy/names/ 4http://www.fallingrain.com/world/index.html 5http://www.narosa.com/nbd/PublisherDistributed.asp 6http://science.thomsonreuters.com 7http://www.allconferences.com/ 2 [3] Unigram <token ab pos(0)> 1 <token re pos(0)> 1 <num char(0)> 1 <num word(0)> 4 <num period(0)> 4 <f kanji(0)> 1 <f hiragana(0)> 1 <f katakana(0)> 1 <f alphabet(0)> 1 <f digit(0)> 1 <h alphabet(0)> 1 <h digit(0)> 1 <h symbol(0)> 1 <first 1-4 string(0)> 4 <last 1-4 string(0)> 4 <token(0)> 1 <last char(i)> 1 <token lc(i)> 1 <capital(i)> 1 <digit(i)> 1 <symbol(i)> 2 <keyword(i)> 4 <dictionary(i)> 15 <num token(0)> 1 <editor(0)> 1 Editor <URL(0)> 1 URL Bigram < y(-1), y(0)> 1 Dict 1 2 10 July 2 Dict 3 [3] <keyword(i)> <dictionary(i)> 7 [1] <keyword(i)> 5 <dictionary(i)> 7 2 [3] 2 5 2 1 5 7 2 0 i { 4, 3, 2, 1, 0, 1, 2, 3, 4} 2 <first 1-4 string(0)> 4 Bigram 4.
3 [1] RA, RE, RTR, RAOT AUTHOR RT, RBT TITLE RW, RC JOURNAL RV, RN, RPP VOLUME RP PUBLISHER RD DAY RM MONTH RY YEAR RL, RURL, ROT OTHER 3 Dict 4 IEICE-J IEICE-E IPSJ 3journal AUTHOR 7,210 6,272 6,730 19,391 TITLE 4,409 4,289 4,308 12,835 JOURNAL 1,551 1,747 2,026 5,004 VOLUME 2,221 2,181 1,763 3,441 PUBLISHER 274 336 400 842 DAY 9 54 11 67 MONTH 23 31 32 58 YEAR 60 59 52 78 OTHER 224 432 618 1,107 1 [1] 3 AUTHORTITLEJOURNALVOLUMEPUBLISHER DAYMONTHYEAROTHER 9 3 (IEICE-J) (IEICE-E) (IPSJ) 1 3 4 4 3journal 3journal CRF 9 2 <dictionary(i)> Dict 3 3 VOLUMEDAYMONTH 3 Dict 104 Dict 1 2 5 18 19 60 Unigram 1 Bigram 61 5 [1] IEICE-J 0.9662 0.9887 IEICE-E 0.9709 0.9895 IPSJ 0.9646 0.9906 5. 5. 1 4 IEICE-J 2000 4,787 2,193 IEICE-E 2000 4,497 0 IPSJ 2000 4,574 1,537 5 1 [1] 3 3 3 CRF CRF++ 5. 2 4 5 [1] IEICE-J 2.25 IEICE-E 1.86 IPSJ 2.6 Dict 9 1
6 IEICE-J 0.9662 0.9659 IEICE-E 0.9709 0.9702 IPSJ 0.9646 0.9646 (a) IEICE-J 3 IEICE-E IEICE-J JOURNAL 1 4 (c) IPSJ TITLEJOUNRAL IPSJ TITLEJOURNAL 2 IPSJ 2 JOURNAL 1 TITLE JOURNAL 2 TITLE JOURNAL 1 3 6. (b) IEICE-E (c) IPSJ 4 Dict 4 4 (a) IEICE-J TITLEJOURNALVOLUME IEICE-J TITLEJOURNAL VOLUME 3 JOURNAL 1 4 (b) IEICE-E AUTHORTITLEJOURNAL IEICE-E AUTHORTITLEJOURNAL 5. 2 NII (CiNii) 8 9 1,042 dblp 10 journal 742 6 6 1,800 8http://ci.nii.ac.jp/journal/society/all ja.html 9https://www.ieice.org/jpn/shiori/pdf/furoku e.pdf 10http://dblp.uni-trier.de/
7. CRF IEICE-J 2.25 IEICE-E 1.86 IPSJ 2.6 IEICE-J IEICE-E IPSJ [9] K.Seymore, A.McCallum and R.Rosenfeld, Learning hidden Markov model structure for information extraction, In AAAI 99 Workshop on Machine Learning for Information Extraction, 1999. [10] A. McCallum, K. Nigam, J. Rennie and K. Seymore, Automating the Construction of Internet Portals with Machine Learning, Information Retrieval, vol. 3, no. 2, pp. 127-163, 2000. [11] M. Ohta, R. Inoue, A. Takasu, Empirical Evaluation of Active Sampling for CRF-Based Analysis of Pages, In Proc. of IEEE IRI 2010, pp. 13 18, 2010. [12] M. Ohta, R. Inoue, A. Takasu, Empirical Evaluation of CRF-Based Bibliography Extraction from Research Papers, IADIS International Journal on Computer Science and Information Systems, vol. 7, no. 2, pp. 18 31, 2012. (B)( 15H02789) (C)( 25330384) [1],,,,,, vol. 8, no. 2, pp. 18 29, 2015. [2] J. Lafferty, A. McCallum and F. Pereira, Conditional Random Fields : Probabilistic Models for Segmenting and Labeling Sequence Data, In Proc. of 18th International Conference on Machine Learning, pp. 282 289, 2001. [3],,,, CRF,, vol. 2015-DBS-162, no. 3, pp. 1 8, 2015. [4],,,,,, 2003-FI-72/2003-NL-157, pp. 83-90, 2003 [5] T. Okada, A. Takasu, and J. Adachi, Bibliographic Component Extraction Using Support Vector Machines and Hidden Markov Models, ECDL 2004, LNCS 3332, pp. 501-512, 2004. [6] F. Peng, A. McCallum, Accurate Information Extraction from Research Papers Using Conditional Random Fields, HLT-NAACL 2004, pp. 329 336, 2004. [7] I.G. Councill, C.L. Giles and M.Y. Kan, ParsCit: An Open-Source CRF Reference String Parsing Package, In Proc. of language resource and evaluation conference, 2008. [8] C.Cortes and V.Vapnik, Support-Vector Networks, Machine Learning, vol.20, no. 3, pp.273-297, 1995.