Syntax Analysis Part IV

Syntax Analysis Part IV Chapter 4: Bottom-Up Parsing Sles adapted from : Robert van Engelen, Flora State University

Bottom-Up Parsing LR methods (Left-to-right, Rightmost derivation) SLR, Canonical LR, LALR Other cases: Shift-reduce parsing Operator-precedence parsing

Bottom-Up Parsing The process of reducing an input string w to the start symbol of the grammar At each reduction step, a specific substring matching the right-hand se of a production is replaced by the nonterminal at the lefthand se of that production

Bottom-Up Parsing Grammar E E + T T T T * F F F ( E ) Rightmost derivation E rm T rm T * F rm T * rm F * rm *

Bottom-Up Parsing * F * T * T * F T E F F T * F T F T * F F E rm T rm T * F rm T * rm F * rm *

Handle Pruning A rightmost derivation in reverse can be obtained by handle-pruning A handle is a substring of grammar symbols that matches a right-hand se of a production in a rightmost sentential form

Handle Pruning Locate the handle and replace by the lefthand se of the production S A Step in a rightmost derivation α β x

Handle Pruning Grammar: S a A B e A A b c b B d Underlined substrings are handles Reducing a sentence: a b b c d e a A b c d e a A d e a A B e S Shift-reduce corresponds to a rightmost derivation: S rm a A B e rm a A d e rm a A b c d e rm a b b c d e S A A A A A A B A B a b b c d e a b b c d e a b b c d e a b b c d e

Handle Pruning Grammar: S a A B e A A b c b B d a b b c d e a A b c d e a A d e a A B e S a b b c d e a A b c d e a A A e? Handles NOT a handle, because further reductions will fail (result is not a sentential form)

Exercise Conser the grammar: E E E + E E - E ( E ) What is the correct series of reductions for the string -( + ) +?

Exercise (cont d) -( + ) + -( + E ) + -( + E) + -(E + E) + -(E) + -E + E + E + E E + E E E E E + E E - E ( E )

Exercise (cont d) -( + ) + -(E + ) + -(E + E ) + -(E + E ) + E -(E + E) + E -(E) + E -E + E E + E E + E E E E E + E E - E ( E )

Exercise (cont d) -( + ) + -(E + ) + -(E + E ) + -(E + E) + -(E) + -E + E + E + E E + E E E E E + E E - E ( E )

Shift-Reduce Parsing Shift-reduce is a family of bottom-up parsers based on handle pruning Use a stack data structure and four operations Shift: move symbol from input to stack Reduce: replace handle in stack with lhs Accept Reject

Shift-Reduce Parsing Grammar: E E + E E E * E E ( E ) E Find handles to reduce Stack $ $ $E $E+ $E+ $E+E $E+E* $E+E* $E+E*E $E+E $E Input +*$ +*$ +*$ *$ *$ *$ $ $ $ $ $ Next Action shift reduce E shift shift reduce E shift (or reduce?) shift reduce E reduce E E * E reduce E E + E accept How to resolve conflicts?

Exercise Conser the grammar: E E E + E E - E ( E ) What is the correct shift-reduce parse for the string + -?

Exercise (cont d) Stack $ $ $E + $E +- $E +- $E +-E $E +-E $E +E $E +E $E Input +-$ +-$ -$ $ $ $ $ $ $ $ E E E + E E - E ( E )

Exercise (cont d) Stack $ $ $+ $+- $+- $+-E $+E $+E $E +E $E Input +-$ +-$ -$ $ $ $ $ $ $ $ E E E + E E - E ( E )

Exercise (cont d) Stack $ $ $E $E + $E +- $E +- $E +-E $E +E $E +E $E Input +-$ +-$ +-$ -$ $ $ $ $ $ $ E E E + E E - E ( E )

Exercise Conser the grammar: E E E + E E - E ( E ) Identify the handle for the shift-reduce parse state $E + -, + -( + )$ E + - - E + -E

Handle and Stack The handle will always eventually appear at the top of the stack (not at a deeper position) Case 1: B inse A S A B α β γ y z * S α A z rm α β B y z rm α β γ y z rm

Handle and Stack The handle will always eventually appear at the top of the stack Case 2: B and A on different branches B S A α γ x y z * S α B x A z rm α B x y z rm α γ x y z rm

Conflicts Shift-reduce and reduce-reduce conflicts are caused by Ambiguity of the grammar The limitations of the parsing method (even when the grammar is unambiguous)

Shift-Reduce Conflicts Ambiguous grammar: S if E then S if E then S else S other Stack $ $ if E then S Input $ else $ Next Action shift or reduce? Resolve in favor of shift, so else matches closest if

Reduce-Reduce Conflicts Grammar: C A B A a B a Stack $ $a Input aa$ a$ Action shift reduce A a or B a? Resolve in favor of reduce A a, otherwise we are stuck!

Rightmost Derivations Handle recognition is related to the recognition of rightmost sentential forms We need to develop a formal model for the above language

Rightmost Derivations Grammar : E E + T T T T * F F F ( E ) E E E + T + T T * F Derivation : E rm E + T T F F F

Rightmost Derivations Grammar : E E + T T T T * F F F ( E ) E E E + T + T T * F Derivation : E rm E + T rm E + T * F T F F F

Rightmost Derivations Grammar : E E + T T T T * F F F ( E ) E E E + T + T T * F Derivation : E rm E + T rm E + T * F rm E + T * T F F F

Rightmost Derivations Grammar : E E + T T T T * F F F ( E ) E E E + T + T T * F Derivation : E rm E + T rm E + T * F * rm E + * T F F F

Rightmost Derivations Grammar : E E + T T T T * F F F ( E ) E E E + T + T T * F Derivation : E rm E + T * rm E + * * rm E + + * T F F F

Rightmost Derivations In order to cut a derivation tree into a rightmost sentential form, iterate the following steps Vertically descend one step Move right horizontally zero or more steps within the current rule Stop when the right corner of the tree is reached

Viable Prefixes A viable prefix is the prefix of a rightmost sentential form, up to the part of terminal symbols already derived We can show that a viable prefix is the tree cut obtained through the vertical-horizontal procedure, up the the point in which the yield of the tree is reached (proof omitted)

Viable Prefixes viable prefix E + T * E E E + T + T T * F already derived terminals T F F F

Viable Prefixes A viable prefix can always be decomposed into the concatenation of prefixes of righthand ses of productions

Viable Prefixes Viable prefix : E + T * Concatenation of prefixes of righthand ses in the cut E T F E E + T + T T * F F F

Viable Prefixes A viable prefix can be decomposed into prefixes of right-hand ses in more than one way

Viable Prefixes S S A B T A B C R C D D Viable prefix : A B C D Viable prefix : A B C D

Viable Prefixes For any CFG G, the set of all possible viable prefixes of G is a regular language (proof omitted) We prove the construction of a NFA for the viable prefixes of G

LR(0) Items An LR(0) item is any production of G with a dot at some position in the right-hand se Thus, a production A X Y Z has four items: [A X Y Z] [A X Y Z] [A X Y Z] [A X Y Z ] Production A ε has a single item [A ]

LR(0) Items The bullet in an item [A α β] marks the portion α of the right-hand se that has already been horizontally processed Convention: We add to the grammar a new starting symbol S, and a production of the form S S

NFA for Viable Prefixes Input alphabet is the set of nonterminal symbols of G union the set of terminal symbols of G State set is the set of LR(0) items Initial state: [S S] Every state is also a final state NFA is partial

NFA for Viable Prefixes For each item [A α X β] add the following transitions, represented by relation For each nonterminal/terminal symbol X [A α X β] X [A α X β] For each production X γ in G [A α X β] ε [X γ]

NFA for Viable Prefixes Transitions on nonterminal/terminal symbol X are used to simulate horizontal part of the tree cut Transitions on symbol ε are used to simulate vertical part of the tree cut

NFA : Example Augmented grammar : S E E T + E T T int * T int ( E )

NFA : Example [S E] Augmented grammar : S E E T + E T T int * T int ( E )

NFA : Example [S E ] E [S E] ε [E T + E ] ε [E T] Augmented grammar : S E E T + E T T int * T int ( E )

NFA : Example ε [E T] T ε [E T ] ε [T ( E )] [T int * T ] [T int] Augmented grammar : S E E T + E T T int * T int ( E )

NFA : Example [T ( E )] [T ( E )] ( Augmented grammar : S E E T + E T T int * T int ( E )

NFA : Example [E T + E ] ε E [T ( E )] [T ( E )] [E T] ε Augmented grammar : S E E T + E T T int * T int ( E )

NFA : Example [S E ] E [S E] ε [E T + E ] ε E [T ( E )] [T ( E )] ε ε [E T] T ε [E T ] ε ε [T ( E )] ( [T int * T ] [T int] Augmented grammar : S E E T + E T T int * T int ( E )

DFA for Viable Prefixes The NFA for viable prefixes can be made deterministic Each state of the resulting DFA is a set of LR(0) items

DFA for Viable Prefixes Each path through the DFA represents several ways of cutting a viable prefix into prefixes of rule right-hand ses If item [X γ ] belongs to a state of the DFA, we have found a handle

LR Parser The LR parser uses the DFA for viable prefixes to detect handles The DFA is run on the stack of the LR parser To avo reading the full stack at each step, we save DFA states for each symbol in the stack