Noun phrase recognition in Machine Translation systems Language of study: Modern Greek Laboratory of Translation and Language Processing Aristotle University of Thessaloniki by Kyriaki IOANNIDOU (kiroanni@auth.gr) Eleni TZIAFA (etziafa@auth.gr) and Rania VOSKAKI (rvoskaki@hotmail.com) International Meeting on Languages, Applied Linguistics and Translation, Évora 6-7/12/2012 1
Presentation overview Lab of Translation & Language Processing: theoretical and methodological framework Noun phrase: definition and limitations of recognition Applications in Machine translation: 1. Results of NP recognition in existing MT systems 2. Methodology used in our research 3. Typology of Noun phrases adapted for Greek 4. Resource produced 5. Results of Noun phrase recognition using our resource 2
Laboratory activities Theoretical framework: transformational grammar Harris 1951; Harris 1965; Harris 1976 Methodology of Lexicon-Grammar : formal & exhaustive description of language using matrices Gross 1975 Syntactic-semantic dictionaries of predicates: ID=V_32GC_245; lex-info=[cat="verb",verb=[lemma= τρώω (eat)]] args= ([pos="0", ([cat="np",hum="true"], [cat="np",nothum="false"])], [pos="1", [cat="np",conc="true"]]) Example: but NOT Το μωρό τρώει ένα μήλο (The baby is eating an apple) *Το μηχάνημα τρώει ένα μήλο (*The machine is eating an apple) * Parser exploiting these resources: under construction 3
Machine translation systems Rule-based Machine Translation (Nirenburg 1989) Statistical Machine Translation (Brown et al. 1993) Example-based Machine Translation (Nagao 1984) Hybrid Machine Translation(Boretz 2009) rules used at: pre-processing processing post-processing We constructed machine-readable rules to recognize simple and complex noun phrases 4
Vauquois triangle 5
Greek text Machine translation Babel Fish, Bing Translator, Google Translate, Systran, Wordlingo Workflow Analysis Transfer our contribution: noun phrases recognition French text French text French text Generation French text French text 6
Noun phrases in Machine Translation systems We are not dealing with problems in: semantics (recognition, transfer, generation) ανάπτυξη a) croissance (growth) b) développement (development) morphology (e.g. unknown words) transliteration πολιτιστικής κληρονομιάς (cultural inheritance) politistikis klironomias [Bing Translator] H ρεπόρτερ μας (our reporter) a) Notre H reporter [Babel Fish, Wordlingo] b) H notre journaliste [Bing Translator, Google Translate] 7
Noun phrases in Machine Translation systems We are not dealing with problems in structure: during generation phase (target language) τα έργα (the projects) We focus on structure: l des projets [Google Translate] during recognition phase (source language) ο τομέας της πολιτιστικής κληρονομιάς (cultural inheritance domain) son secteur d'héritage culturel [Systran] (his domain of cultural inheritance) 8
Noun phrase recognition in Machine Translation systems (adjective)* + noun εμφανή ευρωπαϊκή διάσταση un éminent dimension européenne [Google Translate] masculine [feminine noun feminine adjective] (clear European dimension) Η κοινοτική χρηματοδοτική στήριξη La communautaire soutien financier [Babel Fish, Systran] feminine adjective [masculine noun masculine adjective] Communauté soutien financier [Bing Translator, Google Translate] feminine noun [masculine noun masculine adjective] (communautery financial support) 9
Noun phrase recognition in Machine Translation systems NP (Npgen)* μία περίοδος δράσης τεσσάρων ετών une période d'action quatre ans [Bing Translator] une durée d'action quatre années [Google Translate] (a period of four years of action) την επιμόρφωση των επαγγελματιών του κλάδου της πολιτιστικής κληρονομιάς la formation de professionnels secteur du patrimoine culturel [Bing Translator] la formation des professionnels domaine du patrimoine culturel [Google Translate] (the training of professionals in the domain of cultural inheritance) 10
Noun phrase recognition in Machine Translation systems ambiguities among determiners (def. articles VS poss. det.) του/της a. the (genitif) (:of the) b. his/her articles τα πόδια του τραπεζιού the legs the (gen. case) table (gen. case) (:the legs of the table) [του: def. article in genitive case] possessive determiners το βιβλίο του the book his (:his book) [του: possessive determiner] *literal translations 11
Noun phrase recognition in Machine Translation systems ambiguities among determiners (def. articles VS poss. det) του/της a. the (genitif) (:of the) b. his/her problematic structure ο τομέας της πολιτιστικής κληρονομιάς article noun article/possesive noun correct recognition: [ο τομέας] [της πολιτιστικής κληρονομιάς] ([the domain] [(of) the cultural inheritance]) wrong recognition: [ο τομέας της] [πολιτιστικής κληρονομιάς] [son secteur] [d'héritage culturel] [Systran] (:his domain of cultural inheritance) *literal translations 12
Noun phrase recognition in Machine Translation systems quotation marks την ψυχολογία του"ασθενή" μου (:the psychology (of) the patient my) (: my patient s psychology) la psychologie du "? [astheni]"? mon [Babel Fish] la psychologie de «patient» moi [Google Translate] ma psychologie «malade» [Systran] ma psychologie «malade» [Wordlingo] 13
Noun phrase recognition Chunking/shallow parsing Abney 1991; Ramshaw & Marcus 1995; Example of a chunked sentence: [My friend] [saw] [your new car] NP VP NP Focus on noun phrase chunking Voutilainen 1993; Tjonk Kim Sang 2000, Bai, Li, Kim & Lee 2006 Example of a noun chunk: [My friend] 14
Noun phrase recognition Shallow parsing by using finite-state automata Brill 1993, Roche 1993; Abney 1996; Blanc et al. 2007; Mokrane et al. 2008 15
Typology of Noun Phrases Base Noun Phrases Διάβασα το βιβλίο I read the book Maximal-Length Noun Phrases Διάβασα το βιβλίο του Γιώργου με το κόκκινο εξώφυλλο I read George s old book with the red cover. Not taking into account subordinate clauses functioning as argument clauses or adjunct clauses (e.g. relatives, SV-clauses) I know what you did The woman who left a minute ago is my cousin Ramshaw & Marcus 1995; Tjong Kim Sang 2000; Bai, Li, Kim & Lee 2006 16
Typology of Noun Phrases Base Noun Phrases Possible lexical heads: noun Είδα το φίλο σου I saw friend your (:I saw your friend) pronoun Τον είδα Him I saw (:I saw him) adjective (nominalization) Είδα τον άσχημο I saw the ugly (: I saw the ugly one) adverb (nominalization) Γνώρισα τους πάνω I met the up (:I met the ones living upstairs) *literal translations 17
Typology of Noun Phrases Maximal-Length Noun Phrases Formed from combinations of Base Noun Phrases use of genitive case Έβαψα το σπίτι του θείου μου I painted the house [my oncle]:gen.(:i painted my oncle s house) Apposition Τηλεφωνώ στο Γιάννη το συνάδελφό μου I am calling John my colleague prepositional noun phrase Νοίκιασα το σπίτι με το φράχτη I rent the house with the fence coordination Είδα το Γιάννη και τη Μαρία I saw John and Mary *literal translations 18
Resource produced 916 grammars 828 grammars for Base Noun phrases 730 with a noun as lexical head 57 with an adjective as lexical head 40 with a pronoun as lexical head 1 with an adverb as lexical head 88 grammars for Maximal-Length Noun phrases 19
Results obtained Results of Machine Translation systems Results obtained by grammars εμφανή ευρωπαϊκή διάσταση un éminent dimension européenne [Google Translate] (clear european dimension) Recognition probably done: [εμφανή] [ευρωπαϊκή διάσταση] [εμφανή ευρωπαϊκή διάσταση] μία περίοδος δράσης τεσσάρων ετών une période d'action quatre ans [Bing Translator] (a period of action of four years) Recognition probably done: [μία περίοδος δράσης] [τεσσάρων ετών] [μία περίοδος δράσης τεσσάρων ετών] ο τομέας της πολιτιστικής κληρονομιάς son secteur d'héritage culturel [Systran] Recognition probably done: [ο τομέας της] [πολιτιστικής κληρονομιάς] [ο τομέας της πολιτιστικής κληρονομιάς] 20
Resource evaluation Ιn our corpus of 40.000 words, our grammars recognized 8722 noun phrases, covering 57% of the corpus. In 320 out of these 8722 noun phrases, we did not have an exact match (96,4% precision) 21
References Brown, P., Della Pietra, S., Della Pietra, V., & Mercer, R. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), σσ. 263-311. Gross, M. (1975). Méthodes en syntaxe. Régime des constructions complétives. Paris: Hermann. Harris, Z. (1951). Methods in Structural Linguistics. Chicago: University of Chicago Press. Harris, Z. (1965). Transformational Theory. Language: 41, 3, σσ. 363-401. Harris, Z. (1976). Notes du cours de syntaxe. Paris: Éditions du Seuil. Kyriacopoulou, T. (2005). L'analyse automatique des textes écrits : Le cas du grec moderne. Thessalonique: University Studio Press. Nagao, M. (1984). A Framework of a Mechanical Translation between Japanese and English by Analogy Principle. Στο A. E. Banerji, Artificial and Human Intelligence (σσ. 173-180). North-Holland. Vauquois, B. (1968). A Survey of Formal Grammars and Algorithms for Recognition and Transformation in Machine Translation. Proceedings of the IFIP Congress-6, (σσ. 254-260). 22