18CSC304J - Compiler Design UNIT 1

 12M:


Phases of a Compiler:


   


  • Each phase transforms the source program from one representation into another representation

  • They communicate with error handlers and the symbol table

  • Lexical Analyzer: reads source program and returns tokens; also called scanner

  • Syntax Analyzer: reads tokens and returns a parse tree; also called parser

  • Semantic Analyzer: uses parse tree to check the source program for semantic consistency

  • Intermediate Code Generation: after semantic analysis, an intermediate code of the source code is generated

  • Code Optimization: Optimizing the intermediate code

  • Code Generation:  takes the optimized representation of the intermediate code and returns the target code


Symbol Table: 

  • data structure holding information about all symbols defined in the source program

  • used as reference by all phases of a compiler.


Example :




Design of a Lexical Analyzer (Lex)

  • LEX is a software tool that automatically constructs a lexical analyzer from a program

  • The lexical analyzer will be of the form:

  • Each pattern pi is a regular expression and each action i is a program fragment to be executed whenever there is a match for lexeme and the input found by the pi

  • The Lex compiler constructs a transition table for a FA from the regular expression pattern in the Lex specification

  • The lexical analyzer itself consists of a simulator that uses this transition table



Lex in use:



General format:


  • The declarations section includes declarations of variables, manifest constants

  • The translation rules have the form: Pattern {Action}

  • The third section holds whatever additional functions are used in the actions



RE to DFA

Refer class notes



4m:


Cousins of Compiler



  • Pre processor produces source program as input for the compiler

  • Some compiler produces assembly code which is passed to the assembler for further processing

  • Loader loads the relocatable machine code to the proper location

  • Link editor allows us to make a single program from several files of relatable machine code



Compiler Construction Tools

  • Parse generator - automatically produce syntax analyzer from a programming language

  • Scanner generator - produces lexical analyzer from a regular expression

  • Syntax directed translation engine - generate intermediate code

  • Code generator - intermediate language into machine language

  • Data flow analysis engines - shows how values are transmitted from one part to other

  • Compiler construction toolkit - integrate set of routines in different phases



Input Buffering

  • Sometimes lexical analyzer needs to look ahead some symbols to decide about the token to return

  • We need to introduce a 2 buffer scheme to handle large look-aheads safely

  • A buffer contains data that is stored for a short amount of time, before it is used

  • Two methods of input buffering 

  • One buffer scheme

  • Two buffer scheme



Role of Lexical analysis. What is pattern, lexeme and tokens?

Role of Lexical analyzer:

  • Lexical analyzer reads the source program character by character to produce tokens


Token:

  • It is a sequence of characters that can be treated as a single logical entity (identifiers, keywords, operators, special symbols, constants)

Pattern:

  • A set of strings in the input for which the same token is produces as output

  • The set of strings are described by a rule called a pattern

Lexeme:

  • It is a sequence of characters in the source program that is matched by the pattern for the token



Error recovery actions in lexical analysis

  • Panic mode: successive characters are ignored until we reach a well formed token

  • Delete a character from the remaining input

  • Insert a missing character into the remaining input

  • Replace a character by another character

  • Transpose two adjacent characters

Comments

Popular posts from this blog

18ECO124T - HUMAN ASSIST DEVICES UNIT 4 & 5 - 12M

18CSE359T - NATURAL LANGUAGE PROCESSING UNIT 1 - 12M

18CSE359T - NATURAL LANGUAGE PROCESSING UNIT 2 & 3 - 12M