(, 00080 : : TP39. Approach to Automatc Traslato Template Acqusto Based o Statstcal Learg HU R-le, ZONG Cheg-qg, XU Bo (Natoal Laboratory o Patter Recogto, Isttute o Automato, Chese Academy o Sceces, Bejg 00080, Cha; Abstract: I ths paper, we propose a ew approach whch automatcally acqures traslato templates rom the uaotated blgual spoke laguage corpora. Ths approach s a usupervsed, statstcal, data-drve approach. I the approach, two basc algorthms amed grammar ducto algorthm ad algmet algorthm usg Bracketg Trasducto Grammar are adopted. Frs the sematc groups ad the phrasal structure groups are extracted rom both the source laguage ad the target laguage. Secod, the algmet algorthm based o Bracketg Trasducto Grammar algs the phrasal structure groups. The alged phrasal structure groups are post-processed as the traslato templates. The prelmary expermetal result s show that our algorthm s eectve ad practcal. Key words: Blgual grammar ducto, traslato template acqusto, structure algme mache traslato Ktao [] Sato [] 978-
Güver Ccekl [3][4] Wataabe Imamura [5][6] [7] [9] grammar ducto Bracketg Trasducto Grammar BTG 3 4 5 SC PC BTG
Chese part: SCC0 PCC3 PCC8 PCC3 SCC0 PCC PCC0 PCC PCC8 Eglsh part: SCE5 sgle double stadard PCE wat to PCE4 a SCE5 room PCE8 I PCE reserve PCE4 PCE8 PCE4 [[I/ [wat/ to/ ] reserve/ ] [a/ / N * room/n]]. to/ to N= N * =sgle; N= N * =double; N= N * =stadard I wat to reserve N a N * room N= N * =sgle; N= N * =double; N= N * =stadard 3 3. SC PC =0; =+ N N Kullback-Lebler KL
p ( D( p p = ( V p( log = p p e p e V e e Dv ( p, p = D( p p + D( p p e e let let rght rght Dst ( e, e = Dv( p, p + Dv( p, p 3 e e e e * let let let let rght rght (, (, (, (, rght rght Dst e e = Dv p p + Dv (, p p + Dv p p + Dv p p 4 Dv ( p, p SIM = * + Dst 5 w e (pos,w pos w w e pos let pos rght ( u,,, u L u ( v, v, L, v u v Cose Measure Cose o Potwse Mutual Iormato Dce (Dce Co-ecet Cos u v ( = = 6 u v = = CosPMI pm(, u pm(, ( = = 7 pm(, u pm(, = = P(, u pm(, = log u (8 P( P( u P(, u u P( P(u u Dce
Dce s( u s( v = ( = 9 s( u + = = s( v x>0 s(x= s(x=0 SC P( e e MI ( e, e = P( e, e log 0 P( e PC SC PC 3. e,,et c,,cv s t es+, es+,, et c u K v cu+ cv BTG [] <> = max[, ] [] = max Fe ( s, t Fc ( S, U δ ( S, U, s S t u U v ( S s( t S + ( U u( v U 0 <> δ s, = max s S t u U v ( S s( t S + ( U u( v U 0 F ( v c e L F ( s, t F ( S, U, δ ( S, U e c F ( s, t e s t u v 4 5 0.00 ; 3 5 7 0 5 6 6 7 [[I/ [wat/ to/3] reserve/4] [a/5 sgle/6 room/7]]. [7] 4 4. 950 989 074 7.0 6.7 4. N 00
Nr Acc = 00% N N Nr BTG BTG 3 Acc(% BTG 63.58 75.44 3 4 Acc(% Dst* 75.44 Cose Measure 73. Cose o Potwse Mutual Iormato 76.77 Dce Co-ecet 75.69 4.3 3 BTG 4 5 BTG : [] H. Ktao. A Comprehesve ad Practcal Model o Memory-based Mache Traslato[A]. I 3. IJCAI[C]. Chambery, Frace. 993. [] Satosh Sato. MBT: a method or combg ragmets o examples example-based traslato[j]. Artcal Itellgece, 995, 75: 3-50. [3] H. Altay G ver, ad Ilyas Ccekl. Learg Traslato Templates rom Examples[J], Iormato Systems, 998, Vol. 3, No. 6, pp. 353-363. [4] Ilyas Ccekl ad Hall Altay Guver. Learg traslato Templates rom Blgual
Traslato Exmples[J]. I Appled Itellgece, 00, Vol. 5, No. pp. 57-76, [5] H. Wataabe, S. Kurohash, ad E. Aramak. Fdg Structural Correspodeces rom Blgual Parsed Corpus or Corpus-based Traslato[A]. I Proceedgs o the 8th Iteratoal Coerece o Computatoal Lgustcs[C], 000.pp 906-9. [6] K. Imamura. Herarchcal Phrase Algmet Harmozed wth Parsg[A]. I Proceedgs o the 6th Natural Laguage Processg Pacc Rm Symposum[C], 00. pp 377-384. [7] Deka W 997. Stochastc Iverso Trasducto Grammars ad Blgual Parsg o Parallel Corpora[J]. I Computatoal Lgustcs, vol.3, No.3, pp. 377-403. [8] Hele M. Meg ad Ka-Chug Su. Sem-Automatc Acqusto o Doma-Specc Sematc Structures[J], IEEE Trasactos o Kowledge ad Data Egeerg, 00.vol 4,, Jauary/February, pp 7-80. [9] Yajua L, Mg Zho Sheg L, Chagg Huag ad Teju Zhao. Automatc Traslato Template Acqusto Based o Blgual Structure Algmet. Computatoal Lgustcs ad Chese Laguage Processg[J]. 00. Vol.6, No., February, pp. 83-08. [0] Shaoju Zhao ad Dekag L. A Nearest-Neghbor Method or Resolvg PP-Attachmet Ambguty[A]. I Proceedgs o Frst Iteratoal Jot Coerece o Natural Laguage Processg (IJCNLP004[C]. 004.March, Saya, pp.48-434. [] Rle H Chegqg Zog ad Bo Xu. Semautomatc Acqusto o Traslato Templates rom Moolgual Uaotated Corpora[A]. I Proceedgs o Iteratoal Coerece o Natural Laguage Processg ad Kowledge Egeerg[C], 003.October, Bejg, pp 63-67.