18CSC304J - Compiler Design UNIT 1
12M:
Phases of a Compiler:
Each phase transforms the source program from one representation into another representation
They communicate with error handlers and the symbol table
Lexical Analyzer: reads source program and returns tokens; also called scanner
Syntax Analyzer: reads tokens and returns a parse tree; also called parser
Semantic Analyzer: uses parse tree to check the source program for semantic consistency
Intermediate Code Generation: after semantic analysis, an intermediate code of the source code is generated
Code Optimization: Optimizing the intermediate code
Code Generation: takes the optimized representation of the intermediate code and returns the target code
Symbol Table:
data structure holding information about all symbols defined in the source program
used as reference by all phases of a compiler.
Example :
Design of a Lexical Analyzer (Lex)
LEX is a software tool that automatically constructs a lexical analyzer from a program
The lexical analyzer will be of the form:
Each pattern pi is a regular expression and each action i is a program fragment to be executed whenever there is a match for lexeme and the input found by the pi
The Lex compiler constructs a transition table for a FA from the regular expression pattern in the Lex specification
The lexical analyzer itself consists of a simulator that uses this transition table
Lex in use:
General format:
The declarations section includes declarations of variables, manifest constants
The translation rules have the form: Pattern {Action}
The third section holds whatever additional functions are used in the actions
RE to DFA
Refer class notes
4m:
Cousins of Compiler
Pre processor produces source program as input for the compiler
Some compiler produces assembly code which is passed to the assembler for further processing
Loader loads the relocatable machine code to the proper location
Link editor allows us to make a single program from several files of relatable machine code
Compiler Construction Tools
Parse generator - automatically produce syntax analyzer from a programming language
Scanner generator - produces lexical analyzer from a regular expression
Syntax directed translation engine - generate intermediate code
Code generator - intermediate language into machine language
Data flow analysis engines - shows how values are transmitted from one part to other
Compiler construction toolkit - integrate set of routines in different phases
Input Buffering
Sometimes lexical analyzer needs to look ahead some symbols to decide about the token to return
We need to introduce a 2 buffer scheme to handle large look-aheads safely
A buffer contains data that is stored for a short amount of time, before it is used
Two methods of input buffering
One buffer scheme
Two buffer scheme
Role of Lexical analysis. What is pattern, lexeme and tokens?
Role of Lexical analyzer:
Lexical analyzer reads the source program character by character to produce tokens
Token:
It is a sequence of characters that can be treated as a single logical entity (identifiers, keywords, operators, special symbols, constants)
Pattern:
A set of strings in the input for which the same token is produces as output
The set of strings are described by a rule called a pattern
Lexeme:
It is a sequence of characters in the source program that is matched by the pattern for the token
Error recovery actions in lexical analysis
Panic mode: successive characters are ignored until we reach a well formed token
Delete a character from the remaining input
Insert a missing character into the remaining input
Replace a character by another character
Transpose two adjacent characters
Comments
Post a Comment