Top down vs. bottom up parsing. Top down vs. bottom up parsing



Σχετικά έγγραφα
Chapter 4 (c) parsing

The Simply Typed Lambda Calculus

Syntax Analysis Part IV

LECTURE 2 CONTEXT FREE GRAMMARS CONTENTS

derivation of the Laplacian from rectangular to spherical coordinates

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Chap. 6 Pushdown Automata

Homework 3 Solutions

Lecture 2. Soundness and completeness of propositional logic

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

2 Composition. Invertible Mappings

C.S. 430 Assignment 6, Sample Solutions

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Section 7.6 Double and Half Angle Formulas

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

TMA4115 Matematikk 3

PARTIAL NOTES for 6.1 Trigonometric Identities

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Other Test Constructions: Likelihood Ratio & Bayes Tests

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

EE512: Error Control Coding

Math 6 SL Probability Distributions Practice Test Mark Scheme

Assalamu `alaikum wr. wb.

The challenges of non-stable predicates

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Section 8.3 Trigonometric Equations

Syntax Analysis Part V

Inverse trigonometric functions & General Solution of Trigonometric Equations

From the finite to the transfinite: Λµ-terms and streams

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Block Ciphers Modes. Ramki Thurimella

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Syntax Analysis Part I

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

Fractional Colorings and Zykov Products of graphs

Finite Field Problems: Solutions

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Instruction Execution Times

Μηχανική Μάθηση Hypothesis Testing

Development of a CFPSG. Coverage of linguistic phenomena such as agreement and word order

Partial Differential Equations in Biology The boundary element method. March 26, 2013

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

How to register an account with the Hellenic Community of Sheffield.

Srednicki Chapter 55

Modbus basic setup notes for IO-Link AL1xxx Master Block

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

9.09. # 1. Area inside the oval limaçon r = cos θ. To graph, start with θ = 0 so r = 6. Compute dr

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

On a four-dimensional hyperbolic manifold with finite volume

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Example Sheet 3 Solutions

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Section 9.2 Polar Equations and Graphs

Every set of first-order formulas is equivalent to an independent set

Example of the Baum-Welch Algorithm

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

Second Order Partial Differential Equations

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Advanced Subsidiary Unit 1: Understanding and Written Response

14 Lesson 2: The Omega Verb - Present Tense

Math221: HW# 1 solutions

Approximation of distance between locations on earth given by latitude and longitude

Right Rear Door. Let's now finish the door hinge saga with the right rear door

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

[1] P Q. Fig. 3.1

CRASH COURSE IN PRECALCULUS

Statistical Inference I Locally most powerful tests

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

6.3 Forecasting ARMA processes

department listing department name αχχουντσ ϕανε βαλικτ δδσϕηασδδη σδηφγ ασκϕηλκ τεχηνιχαλ αλαν ϕουν διξ τεχηνιχαλ ϕοην µαριανι

Στο εστιατόριο «ToDokimasesPrinToBgaleisStonKosmo?» έξω από τους δακτυλίους του Κρόνου, οι παραγγελίες γίνονται ηλεκτρονικά.

MATH423 String Theory Solutions 4. = 0 τ = f(s). (1) dτ ds = dxµ dτ f (s) (2) dτ 2 [f (s)] 2 + dxµ. dτ f (s) (3)

(C) 2010 Pearson Education, Inc. All rights reserved.

Partial Trace and Partial Transpose

the total number of electrons passing through the lamp.

Capacitors - Capacitance, Charge and Potential Difference

Εγκατάσταση λογισμικού και αναβάθμιση συσκευής Device software installation and software upgrade

Concrete Mathematics Exercises from 30 September 2016

Μεταγλωττιστές. Δημήτρης Μιχαήλ. Ακ. Έτος Ανοδικές Μέθοδοι Συντακτικής Ανάλυσης. Τμήμα Πληροφορικής και Τηλεματικής Χαροκόπειο Πανεπιστήμιο

Bounding Nonsplitting Enumeration Degrees

Section 1: Listening and responding. Presenter: Niki Farfara MGTAV VCE Seminar 7 August 2016

Γιπλυμαηική Δπγαζία. «Ανθπυποκενηπικόρ ζσεδιαζμόρ γέθςπαρ πλοίος» Φοςζιάνηρ Αθανάζιορ. Δπιβλέπυν Καθηγηηήρ: Νηθφιανο Π. Βεληίθνο

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

( ) 2 and compare to M.

Business English. Ενότητα # 9: Financial Planning. Ευαγγελία Κουτσογιάννη Τμήμα Διοίκησης Επιχειρήσεων

About these lecture notes. Simply Typed λ-calculus. Types

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 10η: Basics of Game Theory part 2 Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

Abstract Storage Devices

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Matrices and Determinants

Homework 8 Model Solution Section

Η αλληλεπίδραση ανάμεσα στην καθημερινή γλώσσα και την επιστημονική ορολογία: παράδειγμα από το πεδίο της Κοσμολογίας

Transcript:

CMSC 331 notes (9/17/2004) 1 Top down vs. bottom up parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings. A generator produces sentences in the language described by the grammar A parser construct a derivation or parse tree for a sentence (if possible) Two common types of parsers: bottom-up or data driven top-down or hypothesis driven A recursive descent parser is a way to implement a top-down parser that is particularly simple. 2 Top down vs. bottom up parsing The parsing problem is to connect the root node S S with the tree leaves, the input Top-down parsers: starts constructing the parse tree at the top (root) of the parse tree and move down towards the leaves. Easy to implement by hand, but work with restricted grammars. examples: - Predictive parsers (e.g., LL(k)) A = 1 + 3 * 4 / 5 Bottom-up parsers: build the nodes on the bottom of the parse tree first. Suitable for automatic parser generation, handle a larger class of grammars. examples: shift-reduce parser (or LR(k) parsers) Both are general techniques that can be made to work for all languages (but not all grammars!). 3 Both are general techniques that can be made to work for all languages (but not all grammars!). Recall that a given language can be described by several grammars. Both of these grammars describe the same language The first one, with it s left recursion, causes problems for top down parsers. For a given parsing technique, we may have to transform the grammar to work with it. 4 E -> E + Num E -> Num E -> Num + E E -> Num

CMSC 331 notes (9/17/2004) How hard is the parsing task? Parsing an arbitrary Context Free Grammar is O(n 3 ), e.g., it can take time proportional the cube of the number of symbols in the input. This is bad! If we constrain the grammar somewhat, we can always parse in linear time. This is good! Linear-time parsing LL parsers Recognize LL grammar Use a top-down strategy LR parsers Recognize LR grammar Use a bottom-up strategy 5 LL(n) : Left to right, Leftmost derivation, look ahead at most n symbols. LR(n) : Left to right, Right derivation, look ahead at most n symbols. Simplest method is a full-backup, recursive descent parser Often used for parsing simple languages Write recursive recognizers (subroutines) for each grammar rule If rules succeeds perform some action (i.e., build a tree node, emit code, etc.) If rule fails, return failure. Caller may try another choice or fail On failure it backs up 6 Problems When going forward, the parser consumes tokens from the input, so what happens if we have to back up? Algorithms that use backup tend to be, in general, inefficient Grammar rules which are left-recursive lead to nontermination! 7 Example: For the grammar: <term> -> <factor> {(* /)<factor>}* We could use the following recursive descent parsing subprogram (this one is written in C) void term() { factor(); /* parse first factor*/ while (next_token == ast_code next_token == slash_code) { lexical(); /* get next token */ factor(); /* parse next factor */ } } 8

CMSC 331 notes (9/17/2004) Left-recursive grammars Some grammars cause problems for top down parsers. Top down parsers do not work with leftrecursive grammars. E.g., one with a rule like: E -> E + T We can transform a left-recursive grammar into one which is not. A top down grammar can limit backtracking if it only has one rule per non-terminal The technique of rule factoring can be used to eliminate multiple rules for a non-terminal. 9 A grammar is left recursive if it has rules like X -> X β Or if it has indirect left recursion, as in X -> A β A -> X Why is this a problem? Consider E -> E + Num E -> Num We can manually or automatically rewrite a grammar to remove left-recursion, making it suitable for a top-down parser. 10 Consider the left-recursive grammar S S α β S generates all strings starting with a β and followed by a number of α Can rewrite using right-recursion S βs S αs In general S S α 1 S α n β 1 β m All strings derived from S start with one of β 1,,β m and continue with several instances of α 1,,α n Rewrite as S β 1 S β m S S α 1 S α n S 11 12

CMSC 331 notes (9/17/2004) The grammar S A α δ A S β is also left-recursive because S + S β α where ->+ means can be rewritten in one or more steps This indirect left-recursion can also be automatically eliminated Simple and general parsing strategy Left-recursion must be eliminated first but that can be done automatically Unpopular because of backtracking Thought to be too inefficient In practice, backtracking is eliminated by restricting the grammar, allowing us to successfully predict which rule to use. 13 14 Predictive Parser A predictive parser uses information from the first terminal symbol of each expression to decide which production to use. A predictive parser is also known as an LL(k) parser because it does a Left-to-right parse, a Leftmost-derivation, and k-symbol lookahead. A grammar in which it is possible to decide which production to use examining only the first token (as in the previous example) are called LL(1) LL(1) grammars are widely used in practice. The syntax of a PL can be adjusted to enable it to be described with an LL(1) grammar. Example: consider the grammar S if E then S else S S begin S L S print E L end L ; S L E num = num An S expression starts either with an IF, BEGIN, or PRINT token, and an L expression start with an END or a SEMICOLON token, and an E expression has only one production. 15 16

LL(k) and LR(k) parsers Two important classes of parsers are called LL(k) parsers and LR(k) parsers. The name LL(k) means: L - Left-to-right scanning of the input L - Constructing leftmost derivation k max number of input symbols needed to select a parser action The name LR(k) means: L - Left-to-right scanning of the input R - Constructing rightmost derivation in reverse k max number of input symbols needed to select a parser action So, a LL(1) parser never needs to look ahead more than one input token to know what parser production to apply. 17 CMSC 331 notes (9/17/2004) Consider the grammar E T + E T T int int * T ( E ) Hard to predict because For T, two productions start with int For E, it is not clear how to predict which rule to use A grammar must be left-factored before use for predictive parsing Left-factoring involves rewriting the rules so that, if a non-terminal has more than one rule, each begins with a terminal. 18 Predictive Parsing and Left Factoring Consider the grammar E T + E T T int int * T ( E ) Factor out common prefixes of productions E T X X + E T ( E ) int Y Y * T 19 Consider a rule of the form A -> a B1 a B2 a B3 a Bn A top down parser generated from this grammar is not efficient as it requires backtracking. To avoid this problem we left factor the grammar. collect all productions with the same left hand side and begin with the same symbols on the right hand side combine the common strings into a single production and then append a new non-terminal symbol to the end of this new production create new productions using this new non-terminal for each of the suffixes to the common production. After left factoring the above grammar is transformed into: A > a A1 A1 -> B1 B2 B3 Bn 20

CMSC 331 notes (9/17/2004) LL(1) means that for each non-terminal and token there is only one production Can be specified via 2D tables One dimension for current non-terminal to expand One dimension for next token A table entry contains one production Method similar to recursive descent, except For each non-terminal S We look at the next token a And chose the production shown at [S,a] We use a stack to keep track of pending non-terminals We reject when we encounter an error state We accept when we encounter end-of-input Left-factored grammar E T X X + E T ( E ) int Y Y * T The LL(1) parsing table: E X T Y int T X int Y * * T + + E ( T X ( E ) ) $ 21 22 Consider the [E, int] entry When current non-terminal is E and next input is int, use production E T X This production can generate an int in the first place Consider the [Y, +] entry When current non-terminal is Y and current token is +, get rid of Y Y can be followed by + only in a derivation in which Y Blank entries indicate error situations Consider the [E,*] entry There is no way to derive a string starting with * from non-terminal E 23 YACC uses bottom up parsing. There are two important operations that bottom-up parsers use. They are namely shift and reduce. (In abstract terms, we do a simulation of a Push Down Automata as a finite state automata.) Input: given string to be parsed and the set of productions. Goal: Trace a rightmost derivation in reverse by starting with the input string and working backwards to the start symbol.