Syntax Analysis Part I

Σχετικά έγγραφα
Syntax Analysis Part IV

Syntax Analysis Part V

Chapter 4 (c) parsing

Top down vs. bottom up parsing. Top down vs. bottom up parsing

Chap. 6 Pushdown Automata

The Simply Typed Lambda Calculus

Μεταγλωττιστές. Δημήτρης Μιχαήλ. Ακ. Έτος Συντακτική Ανάλυση. Τμήμα Πληροφορικής και Τηλεματικής Χαροκόπειο Πανεπιστήμιο

derivation of the Laplacian from rectangular to spherical coordinates

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

C.S. 430 Assignment 6, Sample Solutions

TMA4115 Matematikk 3

Lecture 2. Soundness and completeness of propositional logic

2 Composition. Invertible Mappings

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Homework 3 Solutions

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

LECTURE 2 CONTEXT FREE GRAMMARS CONTENTS

EE512: Error Control Coding

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

The challenges of non-stable predicates

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Μεταγλωττιστές. Δημήτρης Μιχαήλ. Ακ. Έτος Ανοδικές Μέθοδοι Συντακτικής Ανάλυσης. Τμήμα Πληροφορικής και Τηλεματικής Χαροκόπειο Πανεπιστήμιο

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Section 8.3 Trigonometric Equations

PARTIAL NOTES for 6.1 Trigonometric Identities

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Approximation of distance between locations on earth given by latitude and longitude

Models for Probabilistic Programs with an Adversary

Concrete Mathematics Exercises from 30 September 2016

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Fractional Colorings and Zykov Products of graphs

Inverse trigonometric functions & General Solution of Trigonometric Equations

Section 9.2 Polar Equations and Graphs

Other Test Constructions: Likelihood Ratio & Bayes Tests

Bounding Nonsplitting Enumeration Degrees

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

ΓΡΑΜΜΙΚΟΣ & ΔΙΚΤΥΑΚΟΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ

Section 7.6 Double and Half Angle Formulas

Instruction Execution Times

From the finite to the transfinite: Λµ-terms and streams

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Αλγόριθμοι και πολυπλοκότητα NP-Completeness (2)

Every set of first-order formulas is equivalent to an independent set

Partial Differential Equations in Biology The boundary element method. March 26, 2013

About these lecture notes. Simply Typed λ-calculus. Types

On a four-dimensional hyperbolic manifold with finite volume

SOAP API. Table of Contents

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Numerical Analysis FMN011

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Block Ciphers Modes. Ramki Thurimella

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Finite Field Problems: Solutions

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Srednicki Chapter 55

Matrices and Determinants

CRASH COURSE IN PRECALCULUS

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Congruence Classes of Invertible Matrices of Order 3 over F 2

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Homework 8 Model Solution Section

Advanced Subsidiary Unit 1: Understanding and Written Response

Αλγόριθμοι και πολυπλοκότητα Depth-First Search

Formal Semantics. 1 Type Logic

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Notes on the Open Economy

Συντακτικές λειτουργίες

LTL to Buchi. Overview. Buchi Model Checking LTL Translating LTL into Buchi. Ralf Huuck. Buchi Automata. Example

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 10η: Basics of Game Theory part 2 Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Example Sheet 3 Solutions

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

FSM Toolkit Exercises Part II

ST5224: Advanced Statistical Theory II

ΑΛΓΟΡΙΘΜΟΙ Άνοιξη I. ΜΗΛΗΣ

Θα χρησιμοποιήσουμε το bison, μια βελτιωμένη έκδοση του yacc. Φροντιστήριο 2ο Εισαγωγή στο YACC. Yacc. Δομή Προγράμματος Yacc

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Elements of Information Theory

Capacitors - Capacitance, Charge and Potential Difference

Space-Time Symmetries

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 8η: Producer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ - ΤΜΗΜΑ ΠΛΗΡΟΦΟΡΙΚΗΣ ΕΠΛ 133: ΑΝΤΙΚΕΙΜΕΝΟΣΤΡΕΦΗΣ ΠΡΟΓΡΑΜΜΑΤΙΣΜΟΣ ΕΡΓΑΣΤΗΡΙΟ 3 Javadoc Tutorial

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Strain gauge and rosettes

Abstract Storage Devices

Durbin-Levinson recursive method

4.6 Autoregressive Moving Average Model ARMA(1,1)

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΣΥΣΤΗΜΑΤΩΝ ΗΛΕΚΤΡΙΚΗΣ ΕΝΕΡΓΕΙΑΣ ΕΡΓΑΣΤΗΡΙΟ ΥΨΗΛΩΝ ΤΑΣΕΩΝ

Συστήματα Διαχείρισης Βάσεων Δεδομένων

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 11/3/2006

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

Transcript:

1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2011

2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer Token, tokenval Get next token Parser and rest of front-end Intermediate representation Lexical error Syntax error Semantic error Symbol Table

3 The Parser A parser implements a C-F grammar The role of the parser is twofold: 1. To check syntax (= string recognizer) And to report syntax errors accurately 2. To invoke semantic actions For static semantics checking, e.g. type checking of expressions, functions, etc. For syntax-directed translation of the source code to an intermediate representation

4 Syntax-Directed Translation One of the major roles of the parser is to produce an intermediate representation (IR) of the source program using syntax-directed translation methods Possible IR output: Abstract syntax trees (ASTs) Control-flow graphs (CFGs) with triples, three-address code, or register transfer list notation WHIRL (SGI Pro64 compiler) has 5 IR levels!

5 Error Handling A good compiler should assist in identifying and locating errors Lexical errors: important, compiler can easily recover and continue Syntax errors: most important for compiler, can almost always recover Static semantic errors: important, can sometimes recover Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required Logical errors: hard or impossible to detect

6 Viable-Prefix Property Prefix The viable-prefix property of parsers allows early detection of syntax errors Goal: detection of an error as soon as possible without further consuming unnecessary input How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language for (;) Error is detected here Prefix Error is detected here DO 10 I = 1;0

7 Error Recovery Strategies Panic mode Discard input until a token in a set of designated synchronizing tokens is found Phrase-level recovery Perform local correction on the input to repair the error Error productions Augment grammar with productions for erroneous constructs Global correction Choose a minimal sequence of changes to obtain a global least-cost correction

8 Grammars (Recap) Context-free grammar is a 4-tuple G = (N, T, P, S) where T is a finite set of tokens (terminal symbols) N is a finite set of nonterminals P is a finite set of productions of the form α β where α (N T)* N (N T)* and β (N T)* S N is a designated start symbol

9 Notational Conventions Used Terminals a,b,c, T specific terminals: 0, 1, id, + Nonterminals A,B,C, N specific nonterminals: expr, term, stmt Grammar symbols X,Y,Z (N T) Strings of terminals u,v,w,x,y,z T* Strings of grammar symbols α,β,γ (N T)*

10 Derivations (Recap) The one-step derivation is defined by α A β α γ β where A γ is a production in the grammar In addition, we define is leftmost lm if α does not contain a nonterminal is rightmost rm if β does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) The language generated by G is defined by L(G) = {w T* S + w}

11 Derivation (Example) Grammar G = ({E}, {+,*,(,),-,id}, P, E) with productions P = E E + E E E * E E ( E ) E - E E id Example derivations: E - E - id E rm E + E rm E + id rm id + id E * E E * id + id E + id * id + id

12 Chomsky Hierarchy: Language Classification A grammar G is said to be Regular if it is right linear where each production is of the form A w B or A w or left linear where each production is of the form A B w or A w Context free if each production is of the form A α where A N and α (N T)* Context sensitive if each production is of the form α A β α γ β where A N, α,γ,β (N T)*, γ > 0 Unrestricted

13 Chomsky Hierarchy L(regular) L(context free) L(context sensitive) L(unrestricted) Where L(T) = { L(G) G is of type T } That is: the set of all languages generated by grammars G of type T Examples: Every finite language is regular! (construct a FSA for strings in L(G)) L 1 = { a n b n n 1 } is context free L 2 = { a n b n c n n 1 } is context sensitive

14 Parsing Universal (any C-F grammar) Cocke-Younger-Kasimi Earley Top-down (C-F grammar with restrictions) Recursive descent (predictive parsing) LL (Left-to-right, Leftmost derivation) methods Bottom-up (C-F grammar with restrictions) Operator precedence parsing LR (Left-to-right, Rightmost derivation) methods SLR, canonical LR, LALR

15 Top-Down Parsing LL methods (Left-to-right, Leftmost derivation) and recursive-descent parsing E Grammar: E T + T T ( E ) T - E T id E E Leftmost derivation: E lm T + T lm id + T lm id + id E T T T T T T + id + id + id

16 Left Recursion (Recap) Productions of the form A A α β γ are left recursive When one of the productions in a grammar is left recursive then a predictive parser loops forever on certain inputs

17 A General Systematic Left Recursion Elimination Method Input: Grammar G with no cycles or ε-productions Arrange the nonterminals in some order A 1, A 2,, A n for i = 1,, n do for j = 1,, i-1 do replace each A i A j γ with A i δ 1 γ δ 2 γ δ k γ where A j δ 1 δ 2 δ k enddo eliminate the immediate left recursion in A i enddo

18 Immediate Left-Recursion Elimination Rewrite every left-recursive production A A α β γ A δ into a right-recursive production: A β A R γ A R A R α A R δ A R ε

Example Left Recursion Elim. 19 A B C a B C A A b C A B C C a Choose arrangement: A, B, C i = 1: nothing to do i = 2, j = 1: B C A A b B C A B C b a b (imm) B C A B R a b B R B R C b B R ε i = 3, j = 1: C A B C C a C B C B a B C C a i = 3, j = 2: C B C B a B C C a C C A B R C B a b B R C B a B C C a (imm) C a b B R C B C R a B C R a C R C R A B R C B C R C C R ε

20 Left Factoring When a nonterminal has two or more productions whose right-hand sides start with the same grammar symbols, the grammar is not LL(1) and cannot be used for predictive parsing Replace productions A α β 1 α β 2 α β n γ with A α A R γ A R β 1 β 2 β n

21 Predictive Parsing Eliminate left recursion from grammar Left factor the grammar Compute FIRST and FOLLOW Two variants: Recursive (recursive-descent parsing) Non-recursive (table-driven parsing)

22 FIRST (Revisited) FIRST(α) = { the set of terminals that begin all strings derived from α } FIRST(a) = {a} if a T FIRST(ε) = {ε} FIRST(A) = A α FIRST(α) for A α P FIRST(X 1 X 2 X k ) = if for all j = 1,, i-1 : ε FIRST(X j ) then add non-ε in FIRST(X i ) to FIRST(X 1 X 2 X k ) if for all j = 1,, k : ε FIRST(X j ) then add ε to FIRST(X 1 X 2 X k )

23 FOLLOW FOLLOW(A) = { the set of terminals that can immediately follow nonterminal A } FOLLOW(A) = for all (B α A β) P do add FIRST(β)\{ε} to FOLLOW(A) for all (B α A β) P and ε FIRST(β) do add FOLLOW(B) to FOLLOW(A) for all (B α A) P do add FOLLOW(B) to FOLLOW(A) if A is the start symbol S then add $ to FOLLOW(A)

24 LL(1) Grammar A grammar G is LL(1) if it is not left recursive and for each collection of productions A α 1 α 2 α n for nonterminal A the following holds: 1. FIRST(α i ) FIRST(α j ) = for all i j 2. if α i * ε then 2.a. α j * ε for all i j 2.b. FIRST(α j ) FOLLOW(A) = for all i j

25 Non-LL(1) Examples Grammar S S a a S a S a S a R ε R S ε S a R a R S ε Not LL(1) because: Left recursive FIRST(a S) FIRST(a) For R: S * ε and ε * ε For R: FIRST(S) FOLLOW(R)

26 Recursive-Descent Parsing (Recap) Grammar must be LL(1) Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal s syntactic category of input tokens When a nonterminal has multiple productions, each production is implemented in a branch of a selection statement based on input look-ahead information

27 Using FIRST and FOLLOW in a Recursive-Descent Parser expr term rest rest + term rest - term rest ε term id procedure rest(); begin if lookahead in FIRST(+ term rest) then match( + ); term(); rest() else if lookahead in FIRST(- term rest) then match( - ); term(); rest() else if lookahead in FOLLOW(rest) then return else error() end; where FIRST(+ term rest) = { + } FIRST(- term rest) = { - } FOLLOW(rest) = { $ }

28 Non-Recursive Predictive Parsing: Table-Driven Parsing Given an LL(1) grammar G = (N, T, P, S) construct a table M[A,a] for A N, a T and use a driver program with a stack input a + b $ stack X Y Z $ Predictive parsing program (driver) Parsing table M output

29 Constructing an LL(1) Predictive Parsing Table for each production A α do for each a FIRST(α) do add A α to M[A,a] enddo if ε FIRST(α) then for each b FOLLOW(A) do add A α to M[A,b] enddo endif enddo Mark each undefined entry in M error

Example Table E T E R E R + T E R ε T F T R T R * F T R ε F ( E ) id A α FIRST(α) FOLLOW(A) E T E R ( id $ ) E R + T E R + $ ) E R ε ε $ ) T F T R ( id + $ ) T R * F T R * + $ ) T R ε ε + $ ) F ( E ) ( * + $ ) F id id * + $ ) 30 id + * ( ) $ E E T E R E T E R E R E R + T E R E R ε E R ε T T F T R T F T R T R T R ε T R * F T R T R ε T R ε F F id F ( E )

31 Ambiguous grammar S i E t S S R a S R e S ε E b Error: duplicate table entry LL(1) Grammars are Unambiguous A α FIRST(α) FOLLOW(A) S i E t S S R i e $ S a a e $ S R e S e e $ S R ε ε e $ E b b t a b e i t $ S S a S i E t S S R S R E E b S R ε S R e S S R ε

32 Predictive Parsing Program (Driver) push($) push(s) a := lookahead repeat X := pop() if X is a terminal or X = $ then match(x) // moves to next token and a := lookahead else if M[X,a] = X Y 1 Y 2 Y k then push(y k, Y k-1,, Y 2, Y 1 ) // such that Y 1 is on top invoke actions and/or produce IR output else error() endif until X = $

Example Table-Driven Parsing 33 Stack $E $E R T $E R T R F $E R T R id $E R T R $E R $E R T+ $E R T $E R T R F $E R T R id $E R T R $E R T R F* $E R T R F $E R T R id $E R T R $E R $ Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $ Production applied E T E R T F T R F id T R ε E R + T E R T F T R F id T R * F T R F id T R ε E R ε

34 Panic Mode Recovery Add synchronizing actions to undefined entries based on FOLLOW Pro: Can be automated Cons: Error messages are needed FOLLOW(E) = { ) $ } FOLLOW(E R ) = { ) $ } FOLLOW(T) = { + ) $ } FOLLOW(T R ) = { + ) $ } FOLLOW(F) = { + * ) $ } id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch synch: the driver pops current nonterminal A and skips input till synch token or skips input until one of FIRST(A) is found

35 Phrase-Level Recovery Change input stream by inserting missing tokens For example: id id is changed into id * id Pro: Can be automated Cons: Recovery not always intuitive Can then continue here id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R insert * T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch insert *: driver inserts missing * and retries the production

36 Error Productions E T E R E R + T E R ε T F T R T R * F T R ε F ( E ) id Add error production : T R F T R to ignore missing *, e.g.: id id Pro: Powerful recovery method Cons: Cannot be automated id + * ( ) $ E E T E R E T E R synch synch E R E R + T E R E R ε E R ε T T F T R synch T F T R synch synch T R T R F T R T R ε T R * F T R T R ε T R ε F F id synch synch F ( E ) synch synch