Syntax Analysis Part V

Syntax Analysis Part V Chapter 4: Bottom-Up Parsing Slides adapted from : Robert van Engelen, Florida State University

LR Parsers LR parsers are table-driven algorithms, much like the LL parsers The parse table is used to detect handles A CFG for which we can construct a deterministic parse table is called an LR grammar

LR Parsers The parser maintains states to keep track of where we stand in the parsing process The parser uses a parse table to Update parser states Uniquely detect handles when they appear at the top of the stack

LR(0) Items An LR(0) item is any production with a dot at some position in the right-hand side Thus, a production A X Y Z has four items: [A X Y Z] [A X Y Z] [A X Y Z] [A X Y Z ] Production A ε has a single item [A ]

LR(0) Items The bullet in an item [A α β] marks the portion α of the right-hand side that has already been processed Convention: We add to the grammar a new starting symbol S, and a production of the form S S

LR States States are sets of LR(0) items State I (see figure) records that We have seen A of C A B We have seen ε of B a State I: [C A B] [B a] A C B a

LR States States can represent several alternative trees at a time State I: [E T ] [T T * F] { E T }, T T * F

LR States Trees might share a common left part State I: [T T * F] [F ( F )] [F id] { T T * F id, T T * F } ( ) F

closure() closure(i) takes as input current state I closure(i) returns updated state by applying all possible rule expansions We always have I closure(i)

closure() 1. Start with closure(i) = I 2. If [A α Bβ] closure(i) then for each production B γ in the grammar, add the item [B γ] to I if not already in I 3. Repeat 2 until no new items can be added

Example closure({ [E E] }) = { [E E] } Add [E γ] Grammar: E E + T T T T * F F F ( E ) F id { [E E] [E E + T] [E T]} Add [T γ] { [E E] [E E + T] [E T] [T T * F] [T F] } Add [F γ] { [E E] [E E + T] [E T] [T T * F] [T F] [F ( E )] [F id] }

Kernel Items Kernel items: the initial item [S S] and all items [B α γ] with dot not at the left of right-hand side Nonkernel items: items [B γ] with dot at the left of right-hand side Function closure adds only nonkernel items

goto() goto(i, X) takes as input Current state I Processed grammar symbol X goto(i, X) returns updated state by Moving bullet over processed symbol X Discarding items with symbols other than X after the bullet Applying closure() function

goto() 1. For each [A α X β] I, add [A αx β] to goto(i, X), if not already there 2. Compute closure() of the resulting set Intuitively, goto(i, X) is the set of items that are valid for the viable prefix γx when I is the set of items that are valid for γ

Example Suppose I = { [E E], [E E + T], [E T], [T T * F], [T F], [F ( E )], [F id] } Then goto(i, E) = closure({[e E ], [E E + T]}) = { [E E ], [E E + T] } Grammar: E E + T T T T * F F F ( E ) F id

Example Suppose I = { [E E ], [E E + T] } Then goto(i,+) = closure({[e E + T]}) = { [E E + T], [T T * F], [T F], [F ( E )], [F id] } Grammar: E E + T T T T * F F F ( E ) F id

DFA for Shift/Reduce Decisions Functions closure() and goto() used to construct the set of all possible parsing states, along with their connections This results in a deterministic finite automaton which is used for shift/reduce decisions

DFA for Shift/Reduce Decisions start 0 C a A 1 2 3 B Grammar: S C C A B A a B a Can only reduce A a (not B a) a 4 5 goto(i 0,C) State I 0 : S C C A B A a goto(i 0,a) State I 1 : S C goto(i 0,A) State I 3 : A a State I 2 : C A B B a State I 4 : C A B goto(i 2,B) goto(i 2,a) State I 5 : B a

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,a) The states of the DFA are used to determine if a handle is on top of the stack State I 3 : A a Stack $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ a$ a$ $ $ $ Next Action shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C)

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a goto(i 0,A) The states of the DFA are used to determine if a handle is on top of the stack State I 2 : C A B B a Stack $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ a$ a$ $ $ $ Next Action shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C)

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 2 : C A B B a goto(i 2,a) The states of the DFA are used to determine if a handle is on top of the stack State I 5 : B a Stack $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ a$ a$ $ $ $ Next Action shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C)

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 2 : C A B B a goto(i 2,B) The states of the DFA are used to determine if a handle is on top of the stack State I 4 : C A B Stack $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ a$ a$ $ $ $ Next Action shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C)

DFA for Shift/Reduce Decisions Grammar: S C C A B A a B a State I 0 : S C C A B A a The states of the DFA are used to determine if a handle is on top of the stack goto(i 0,C) State I 1 : S C Stack $ 0 $ 0 a 3 $ 0 A 2 $ 0 A 2 a 5 $ 0 A 2 B 4 $ 0 C 1 Input aa$ a$ a$ $ $ $ Next Action shift (and goto state 3) reduce A a (goto 2) shift (goto 5) reduce B a (goto 4) reduce C AB (goto 1) accept (S C)

Model of an LR Parser input a 1 a 2 a i a n $ stack s m X m s m-1 LR Parsing Program (driver) output X m-1 s 0 action shift reduce accept error goto DFA Constructed with LR(0) method, SLR method, LR(1) method, or LALR(1) method

LR Parsing (Driver) Configuration ( = LR parser state): ($ s 0 X 1 s 1 X 2 s 2 X m s m, a i a i+1 a n $) stack input if action[s m, a i ] = shift s then push a i, push s, and advance input: ($ s 0 X 1 s 1 X 2 s 2 X m s m a i s, a i+1 a n $) if action[s m, a i ] = reduce A β and goto[s m-r, A] = s with r = β then pop 2r symbols, push A, and push s: ($ s 0 X 1 s 1 X 2 s 2 X m-r s m-r A s, a i a i+1 a n $) if action[s m,a i ] = accept then stop if action[s m,a i ] = error then attempt recovery

Example LR Parse Table Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id state 0 1 2 3 4 5 action id + * ( ) $ s5 s4 s6 acc r2 s7 r2 r2 r4 r4 r4 r4 s5 s4 r6 r6 r6 r6 goto E T F 1 2 3 8 2 3 6 s5 s4 9 3 shift & goto 5 7 s5 s4 10 8 s6 s11 reduce by production #1 9 10 11 r1 s7 r1 r1 r3 r3 r3 r3 r5 r5 r5 r5

Example LR Parsing Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F ( E ) 6. F id Stack $ 0 $ 0 id 5 $ 0 F 3 $ 0 T 2 $ 0 T 2 * 7 $ 0 T 2 * 7 id 5 $ 0 T 2 * 7 F 10 $ 0 T 2 $ 0 E 1 $ 0 E 1 + 6 $ 0 E 1 + 6 id 5 $ 0 E 1 + 6 F 3 $ 0 E 1 + 6 T 9 $ 0 E 1 Input id*id+id$ *id+id$ *id+id$ *id+id$ id+id$ +id$ +id$ +id$ +id$ id$ $ $ $ $ Next Action shift 5 reduce 6 goto 3 reduce 4 goto 2 shift 7 shift 5 reduce 6 goto 10 reduce 3 goto 2 reduce 2 goto 1 shift 6 shift 5 reduce 6 goto 3 reduce 4 goto 9 reduce 1 goto 1 accept

Constructing the set of LR(0) States of a Grammar 1. The grammar is augmented with a new start symbol S and unique production S S 2. Initially, set C = { closure({[s S]}) } (this is the start state of the DFA) 3. For each set of items I C and each grammar symbol X (N T) such that goto(i, X) C and goto(i, X), add the set of items goto(i, X) to C 4. Repeat 3 until no more sets can be added to C

LR(0) Grammars Construct FA automaton based on LR(0) states and goto() function If there is no conflict in action table then grammar is LR(0) Shift-reduce parser based on LR(0) table is too restricted and never used; need to be extended

SLR Grammars SLR (Simple LR): a simple extension of LR(0) shift-reduce parsing SLR eliminates some LR(0) conflicts by populating the parsing table with reductions A α on symbols in FOLLOW(A) S E E id + E E id State I 0 : S E E id + E E id State I 2 : Shift on + goto(i 0, id) E id + E E id goto(i 2,+) FOLLOW(E) = {$} thus reduce on $

SLR Parsing Table Reductions do not fill entire rows Otherwise the same as LR(0) id + $ E 1. S E 2. E id + E 3. E id 0 1 2 3 s2 s2 s3 acc r3 1 4 Shift on + 4 r2 FOLLOW(E) = {$} thus reduce on $

SLR Parsing Build the LR(0) DFA Construct LR(0) items (states) Determine transitions Construct the SLR parsing table from the DFA LR parser program uses the SLR parsing table to determine shift/reduce operations

Constructing SLR Parsing Tables 1. Augment the grammar with S S 2. Construct the set C = {I 0, I 1,, I n } of LR(0) states 3. If [A α aβ] I i and goto(i i, a) = I j then set action[i, a] = shift j 4. If [A α ] I i then set action[i, a] = reduce A α for all a FOLLOW(A) (apply only if A S ) 5. If [S S ] is in I i then set action[i, $] = accept 6. If goto(i i, A) = I j then set goto[i, A] = j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [S S]

Example SLR Grammar Augmented grammar: 1. C C 2. C A B 3. A a 4. B a start and LR(0) States I 0 = closure({[c C]}) I 1 = goto(i 0, C) = closure({[c C ]}) goto(i 0, C) State I 0 : C C C A B A a goto(i 0, a) State I 1 : C C goto(i 0, A) State I 3 : A a final State I 2 : C A B B a State I 4 : C A B goto(i 2, B) goto(i 2, a) State I 5 : B a

Example SLR Parsing Table State I 0 : C C C A B A a State I 1 : C C State I 2 : C A B B a State I 3 : A a State I 4 : C A B State I 5 : B a a $ C A B start 0 C a A 1 2 3 B a 4 5 0 1 2 3 4 5 s3 s5 r3 acc r2 r4 1 2 4 Grammar: 1. C C 2. C A B 3. A a 4. B a

SLR and Ambiguity I 0 : S S S L=R S R L *R L id R L Every SLR grammar is unambiguous, but not every unambiguous grammar is SLR Consider for example the unambiguous grammar S L = R R L * R id R L I 1 : S S I 2 : S L =R R L action[2,=]=s6 action[2,=]=r5 I 3 : S R I 4 : L * R R L L *R L id no Has no SLR parsing table I 5 : L id I 6 : S L= R R L L *R L id I 7 : L *R I 8 : R L I 9 : S L=R

Viable Prefixes The LR(0) automaton recognizes strings of grammar symbols that can appear on the stack of a shift reduce parser that works on rightmost derivations If stack holds α and and buffer is x, then S α x * rm

Viable Prefixes A viable prefix is a prefix of a rightmost sentential form that does not continue past the end of the rightmost handle of that form LR(0) automata recognize viable prefixes The LR(0) automaton is the deterministic version of a nondeterministic automaton with ε transitions whose states are LR(0) items

LR(1) Grammars SLR too simple LR(1) parsing uses lookahead to avoid unnecessary conflicts in parsing table LR(1) item = LR(0) item + lookahead LR(0) item: [A α β] LR(1) item: [A α β, a]

SLR Versus LR(1) Split the SLR states by adding LR(1) lookahead Unambiguous grammar 1. S L = R 2. R 3. L * R 4. id 5. R L Lookahead : = S L =R action[2,=]=s6 I 2 : S L =R R L R L split Should not reduce on =, because no right-sentential form begins with R= Lookahead : $ action[2,$]=r5

LR(1) Items An LR(1) item [A α β, a] contains a lookahead terminal a, meaning: with α already on top of the stack, expect to see βa For items of the form [A α, a] the lookahead a is used to reduce A α only if the next input is a For items of the form [A α β, a] with β ε the lookahead has no effect

The Closure Operation for LR(1) Items 1. Start with closure(i) = I 2. If [A α Bβ, a] closure(i) then for each production B γ in the grammar and each terminal b FIRST(βa), add the item [B γ, b] to I if not already in I 3. Repeat 2 until no new items can be added

The Goto Operation for LR(1) Items 1. For each item [A α Xβ, a] I, add [A αx β, a] to goto(i, X), if not already there 2. Repeat step 1 until no more items can be added 3. Compute closure() of the resulting set

Constructing the set of LR(1) Items of a Grammar 1. Augment the grammar with a new start symbol S and production S S 2. Initially, set C = { closure({[s S, $]}) } (start state of the DFA) 3. For each set of items I C and each grammar symbol X (N T) such that goto(i,x) C and goto(i,x), add the set of items goto(i,x) to C 4. Repeat 3 until no more sets can be added to C

Example Grammar and LR(1) Items Unambiguous LR(1) grammar: S L = R R L * R id R L Augment with S S LR(1) items (next slide)

I 0 : I 1 : I 2 : I 3 : I 4 : I 5 : [S S, $] goto(i 0,S)=I 1 [S L=R, $] goto(i 0,L)=I 2 [S R, $] goto(i 0,R)=I 3 [L *R, =/$] goto(i 0,*)=I 4 [L id, =/$] goto(i 0,id)=I 5 [R L, $] goto(i 0,L)=I 2 [S S, $] [S L =R, $] goto(i 2,=)=I 6 [R L, $] [S R, $] [L * R, =/$] goto(i 4,R)=I 7 [R L, =/$] goto(i 4,L)=I 8 [L *R, =/$] goto(i 4,*)=I 4 [L id, =/$] goto(i 4,id)=I 5 [L id, =/$] I 6 : I 7 : I 8 : I 9 : I 10 : I 11 : I 12 : I 13 : [S L= R, $] goto(i 6,R)=I 9 [R L, $] goto(i 6,L)=I 10 [L *R, $] goto(i 6,*)=I 11 [L id, $] goto(i 6,id)=I 12 [L *R, =/$] [R L, =/$] [S L=R, $] [R L, $] [L * R, $] goto(i 11,R)=I 13 [R L, $] goto(i 11,L)=I 10 [L *R, $] goto(i 11,*)=I 11 [L id, $] goto(i 11,id)=I 12 [L id, $] [L *R, $] shorthand for two items [R L, =] [R L, $]

Constructing Canonical LR(1) Parsing Tables 1. Augment the grammar with S S 2. Construct the set C={I 0,I 1,,I n } of LR(1) items 3. If [A α aβ, b] I i and goto(i i,a)=i j then set action[i,a]=shift j 4. If [A α, a] I i then set action[i,a]=reduce A α (apply only if A S ) 5. If [S S, $] is in I i then set action[i,$]=accept 6. If goto(i i,a)=i j then set goto[i,a]=j 7. Repeat 3-6 until no more entries added 8. The initial state i is the I i holding item [S S,$]

0 Example LR(1) Parsing Table id * = $ s5 s4 S L R 1 2 3 1 acc 2 s6 r6 Grammar: 1. S S 2. S L = R 3. S R 4. L * R 5. L id 6. R L 3 4 5 6 7 8 9 s5 s12 s4 s11 r5 r4 r6 r3 r5 r4 r6 r2 8 7 10 4 10 r6 11 s12 s11 10 13 12 r5 13 r4

LALR(1) Grammars LR(1) parsing tables have many states LALR(1) parsing (Look-Ahead LR) merges LR(1) states to reduce table size Less powerful than LR(1) Will not introduce shift-reduce conflicts, because shifts do not use lookaheads May introduce reduce-reduce conflicts, but seldom do so for grammars of programming languages

Constructing LALR(1) Parsing Tables 1. Construct sets of LR(1) items 2. Combine LR(1) sets with sets of items that share the same first part I 4 : I 11 : [L * R, =] [R L, =] [L *R, =] [L id, =] [L * R, $] [R L, $] [L *R, $] [L id, $] [L * R, =/$] [R L, =/$] [L *R, =/$] [L id, =/$] Shorthand for two items in the same set

Example LALR(1) Grammar Unambiguous LR(1) grammar: S L = R R L * R id R L Augment with S S LR(1) and LALR(1) : items (next slides)

I 0 : I 1 : [S S, $] goto(i 0,S)=I 1 [S L=R, $] goto(i 0,L)=I 2 [S R, $] goto(i 0,R)=I 3 [L *R, =/$] goto(i 0,*)=I 4 [L id, =/$] goto(i 0,id)=I 5 [R L, $] goto(i 0,L)=I 2 goto(i 0,=)=I 6 [S S, $] I 6 : I 7 : I 8 : [S L= R, $] goto(i 6,R)=I 9 [R L, $] goto(i 6,L)=I A [L *R, $] goto(i 6,*)=I B [L id, $] goto(i 6,id)=I C [L *R, =/$] [R L, =/$] I 2 : I 3 : I 4 : I 5 : [S L =R, $] goto(i 2,=)=I 6 [R L, $] [S R, $] [L * R, =/$] goto(i 4,R)=I 7 [R L, =/$] goto(i 4,L)=I 8 [L *R, =/$] goto(i 4,*)=I 4 [L id, =/$] goto(i 4,id)=I 5 [L id, =/$] I 9 : I A : I B : I C : I D : [S L=R, $] [R L, $] [L * R, $] goto(i B,R)=I D [R L, $] goto(i B,L)=I A [L *R, $] goto(i B,*)=I B [L id, $] goto(i B,id)=I C [L id, $] [L *R, $]

I 0 : I 1 : [S S, $] goto(i 0,S)=I 1 [S L=R, $] goto(i 0,L)=I 2 [S R, $] goto(i 0,R)=I 3 [L *R, =/$] goto(i 0,*)=I 4 [L id, =/$] goto(i 0,id)=I 5 [R L, $] goto(i 0,L)=I 2 goto(i 0,=)=I 6 [S S, $] I 6 : I 7D : I 8A : [S L= R, $] goto(i 6,R)=I 9 [R L, $] goto(i 6,L)=I 8A [L *R, $] goto(i 6,*)=I 4B [L id, $] goto(i 6,id)=I 5C [L *R, =/$] [R L, =/$] I 2 : [S L =R, $] [R L, $] I 9 : [S L=R, $] I 3 : [S R, $] I 4B : [L * R, =/$] goto(i 4B,R)=I 7 [R L, =/$] goto(i 4B,L)=I 9 [L *R, =/$] goto(i 4B,*)=I 4B [L id, =/$] goto(i 4B,id)=I 5 I 5C : [L id, =/$]

Example LALR(1) Parsing Table id * = $ S L R 0 s5 s4 1 2 3 Grammar: 1. S S 2. S L = R 3. S R 4. L * R 5. L id 6. R L 1 2 3 4 5 6 7 s5 s5 s4 s4 s6 r5 r4 acc r6 r3 r5 r4 9 7 9 8 8 r2 9 r6 r6

LL, SLR, LR, LALR Summary LL parse tables computed using FIRST/FOLLOW Nonterminals terminals productions LR parsing tables computed using closure/goto LR states terminals shift/reduce actions LR states nonterminals goto state transitions A grammar is LL(1) if its LL(1) parse table has no conflicts SLR if its SLR parse table has no conflicts LALR(1) if its LALR(1) parse table has no conflicts LR(1) if its LR(1) parse table has no conflicts

LL, SLR, LR, LALR Grammars LR(1) LALR(1) LL(1) SLR LR(0)

Compaction of LR Parsing Tables In programming languages, hundred of states and tens of thousands of table entries Several table rows might be equal: Share rows using pointers Tables might be sparse: Use adjacency representation

Dealing with Ambiguous Grammars 1. S E 2. E E + E 3. E id id + $ s2 s2 s3 r3 s3/r2 acc Shift/reduce conflict: action[4,+] = shift 3 action[4,+] = reduce E E + E 0 1 2 3 4 r3 r2 E 1 4 $ 0 stack $ 0 E 1 + 3 E 4 input id+id+id$ +id$ When shifting on +: yields right associativity id+(id+id) When reducing on +: yields left associativity (id+id)+id

Using Associativity and Precedence to Resolve Conflicts Left-associative operators: reduce Right-associative operators: shift Operator of higher precedence on stack: reduce Operator of lower precedence on stack: shift S E E E + E E E * E E id $ 0 stack $ 0 E 1 * 3 E 5 input id*id+id$ +id$ reduce E E * E

Error Detection in LR Parsing Canonical LR parser uses full LR(1) parse tables and will never make a single reduction before recognizing the error when a syntax error occurs on the input SLR and LALR may still reduce when a syntax error occurs on the input, but will never shift the erroneous input symbol

Error Recovery in LR Parsing Panic mode Pop until state with a goto on a nonterminal A is found, (where A represents a major programming construct) Push A and push goto state Discard input symbols until one is found in the FOLLOW set of A

Error Recovery in LR Parsing Phrase-level recovery Implement error routines for every error entry in table Error productions Pop until state has error production, push lefthand side Discard input until symbol is encountered that allows parsing to continue