LECTURE 2 Development of a CFPSG. Coverage of linguistic phenomena such as agreement and word order CONTENTS 1. Developing a grammar fragment...1 2. A formalism that is too strong and too weak at the same time...3 3. References...4 1. Developing a grammar fragment The Greek data το παιδί παίζει τα παιδιά τρώνε το παιδί παίζει στον κήπο τα παιδιά παίζουν στον κήπο με τα λουλούδια το παιδί τρώει τη τυρόπιττα τα παιδιά τρώνε τα μήλα Introduce terminal symbols (part of speech) N: παιδί, παιδιά, κήπο, λουλούδια, τυρόπιττα, μήλα V: παίζει, τρώνε, παίζουν, τρώει Det: το, τα, τη P: με, σε Exercise 1: Indicate the prhase diagnostics (Lecture 1) used to identify the groups of terminal symbols in the Greek data above Det + Noun (Det N) Preposition + Det + Noun (P Det N) Det + Noun + Preposition + Det + Noun (Det N P Det N) The non terminal symbol vocabulary The string Det N is far too frequent. On the basis of the work done with Lecture 1, we can consider it a phrase and use a non terminal symbol in its place. So. (1) ΝΡ Det N (2) ΡΡ P Det N and by (1) (3) NP Det N P Det N and by (1) and (2) NP NP PP 1
The grammar for the Greek data above S NP VP NP Det N (PP) VP V (NP) (PP) (PP) Note: Brackets indicate optionality. In fact, the rule NP Det N (PP) is a shorthand of two rules: NP Det N PP NP Det N Exercise 2: Here is an alternative grammar for the same data. You are invited to argue in favor or against it. S NP VP NP (Det) NP NP N (PP) VP V (NP) (PP) (PP) Exercise 3: (3) NP Det N (4) NP NP (PP) Assume that we have a parser, that is, a program that matches rules with linguistic data and assigns non terminal symbols to language strings. Assume that our parser is a top down one, that is, it reads the rules in the order it finds them and tries to match them with the data. If a rule fails it takes the next one with the same left hand symbol. If all such rules fail, parsing fails. With the grammar of Exercise 2, such a parser would take the initial symbol S and would expand it to NP first and then to VP. The non terminal symbol that has to be expanded is now NP. If NP succeeds, the parser will try to expand VP. If no expansion of NP or VP is successful, the parser fails. 2
But other things may happen: assume that our parser deals with the string το παιδί παίζει. It starts with the initial symbol S and expands it to NP and VP. Then, NP is the non terminal symbol that has to be expanded. Rather than using the NP rules of Exercise 2, our hypothetical parser uses the rules (3) and (4) above. First, it takes rule (3) and finds that it has to look for a terminal symbol Det. It looks up the string and sees that το is a Det. (3) succeeds and then it looks for a N. It finds παιδί which is a N. (3) succeeds once more and assigns the category NP to the string. Assume however that (4) was the first rule. The parser would have to look for the nonterminal NP and would go back to the rule table and would pick the same rule (4) and that would go on for ever. This is a case of left recursion. The parser would never get out of this loop. By giving the sequence (3),(4) in this order left recursion does not occur with the examples given. Assume however that you had to parse the following string: Τα παιδιά τρώνε μήλα. Make the necessary changes to the rules of Exercise 3 in order to ensure that left recursion will not occur with this set of examples. Exercise 4. LFGW: http://arts.anu.edu.au/linguistics/software/lfgw.html Tutorial1. Develop your first Greek toy grammar: just a couple of rules and four six lexical entries. Here you go! 2. A formalism that is too strong and too weak at the same time We were given a set of data and we developed a grammar that recognizes them, that is, it can assign the correct non terminal labels to the strings. However, this is not all. Ideally, we would like our grammar to recognize the correct strings and reject the incorrect ones. Unfortunately this does not happen. In fact, our grammar rejects correct data and recognizes incorrect ones. Consider the following data: το παιδί παίζουν το παιδιά τρώνε το παιδί παίζει στην κήπο τρώνε μήλα τα παιδιά τρώνε τα παιδιά (τα) μήλα τρώνε τα μήλα η τυρόπιττα τρώει το παιδί A typology of problems Agreement in number and gender 3
το παιδί παίζουν (subject verb agreement) το παιδιά τρώνε (determiner noun agreement) το παιδί παίζει στην κήπο (gender agreement) Word order τρώνε μήλα τα παιδιά (subject follows VP) τρώνε τα παιδιά (τα) μήλα (the flat Greek VP) PRO drop language τρώνε τα μήλα (no need for an explicit subject) Word order could be treated with a modification of the rules of our grammar. Same holds about PRO drop. But look what happens to VP now. Also discuss the rules marked with an (*). S (NP) VP (NP) *S V NP NP NP (Det) N (PP) *VP V (NP) (PP) (PP) Exercise 5: For each sentence below specify the agreement constraints as an add on on the corresponding CFG rule. Use a Prolog like notation for this purpose as shown in the example. Then highlight which part of the constraints is violated. Explain why agreement information should be specified for the left hand symbol of the NP rule. το παιδιά τρώνε (determiner noun agreement) NP Det N Number X Number X Number X Gender Y Gender Y Gender Y Case Z Case Z Case Z το παιδί παίζουν (subject verb agreement) το παιδί παίζει στην κήπο (gender agreement) 3. References Δήμητρα Θεοφανοπούλου Κοντού. 1989. Μετασχηματιστική Σύνταξη. Από την θεωρία στην πράξη. Αθήνα: Εκδόσεις Καρδαμίτσα 4
Dalrymple, Mary. 2002. Syntax and Semantics 34: Lexical Functional Grammar. Academic Press 5