Writing a Parser using (F)LEX and YACC/BISON
1 Task Description
We will implement few statements found in C. We will call this language SubC. Given the grammar of SubC we will write the F(LEX) and YACC/BISON specification file that parses code written in SubC. We will generate a parser for only assignment statements. Following is the grammar for assignment statements for SubC.
|
|
Tokens/Terminal symbols of this grammar are in BOLD font. Below are the specifications for the tokens
Token Name | Pattern/Description |
---|---|
Id | Starts with a letter followed by letters, digits, or underscore. Max length 5 characters. Cannot be any of the keywords: “if”, “else”, “while”, “for”, “repeat”, “until”, “return” , “main” |
Num | Integers ‐ real numbers that include a decimal point and at least one digit to the left and right of the point |
Relop | “<“, “>”, “<=”, “>=”, “==”, “!=” |
Addop | “+”,”‐”,"\pipe\pipe" |
Mulop | “*”, “/”, “%”, “&&” |
Assignop | “=” |
Not | “!” |
If successful in parsing a list of statements the parser will print “success” otherwise will print appropriate error message.
2 Implementation
We have to write (F)LEX and YACC/BISON specification files (e.g., subc.l and subc.y). The parser that yacc/bison generates should be able to call the lexical analyzer generated by lex/flex and perform parsing of the code (i.e., list of assignment statements) generated by the given grammar. Your LEX specification should be able to remove any white space from the code. All ids should be stored in a symbol table and printed out as output.
2.1 Samples
2.1.1 Input 1
|
|
2.1.2 Output 1
|
|
2.2.1 Input 1
|
|
2.2.2 Output 2
|
|
3 Structures of Lex (analyzer) and YACC (parser)
3.1 Lex (Lexical Analyzer):
Lex is a computer program that generates lexical analyzers (what we did manually in the assignment 1 by java, but installing lex in the computer will do it automatically). Lex reads an input stream specifying the lexical analyzer and outputs source code implementing the lexer in the C programming language.
3.2 Structure of a Lex file
The structure of a Lex file is intentionally similar to that of a yacc file; files are divided into three sections, separated by lines that contain only two percent signs, as follows:
|
|
- The definition section defines macros and imports header files written in C. It is also possible to write any C code here, which will be copied verbatim into the generated source file.
- The rules section associates regular expression patterns with C statements. When the lexer sees text in the input matching a given pattern, it will execute the associated C code.
- The C code section contains C statements and functions that are copied verbatim to the generated source file. These statements presumably contain code called by the rules in the rules section. In large programs it is more convenient to place this code in a separate file linked in at compile time.
3.3 Example of a Lex file
The following is an example Lex file for the flex version of Lex. It recognizes strings of numbers (integers) in the input, and simply prints them out.
|
|
If this input is given to flex, it will be converted into a C file, lex.yy.c. This can be compiled into an executable which matches and outputs strings of integers. For example, given the input:
|
|
The program will print:
Saw an integer: 123 Saw an integer: 2 Saw an integer: 6
3.4 YACC-Yet Another Compiler Compiler (Parser)
It is a LALR parser generator, generating a parser, the part of a compiler that tries to make syntactic sense of the source code.
For example, a C program may contain something like:
|
|
In this case, the lexical analyzer would have broken the input stream into a series of “tokens”, like this:
|
|
Note that the lexical analyser (lex) has already determined that where the keyword int
appears within quotes, it is really just part of a literal string. It is up to the parser to decide if the token int
is being used as a keyword or variable. Or it may choose to reject the use of the name int
as a variable name. The parser also ensures that each statement ends with a ;
and that the brackets balance.
More precisely lexical analyzer (i.e. lex) checks correctness of the stream of characters and parser (i.e. yacc) checks correctness of the stream of tokens i.e. which token will be followed by which one etc.
3.5 Structure of a YACC source program
A YACC source program is structurally similar to a LEX one.
|
|
- The declaration section may be empty. Moreover, if the routines section is omitted, the second %% mark may be omitted also.
- Blanks, tabs, and newlines are ignored except that they may not appear in names.
THE DECLARATIONS SECTION
This section may contain the following items -
- Declarations of tokens. Yacc requires token names to be declared as such using the keyword
%token
. - Declaration of the start symbol using the keyword
%start
- C declarations: included files, global variables, types.
- C code between
%{
and%}
.
RULES SECTION
A rule has the form -
|
|
Actions may be associated with rules and are executed when the associated sentential form is matched.
LEX-YACC INTERACTION
yyparse()
calls yylex()
when it needs a new token.
LEX | YACC |
---|---|
return(TOKEN) |
%token TOKEN |
TOKEN is used in production |
The external variable yylval
-
is used in a LEX source program to return values of lexemes,
-
yylval
is assumed to be integer if you take no other action. -
Changes related to
yylval
must be made-
in the definitions section of YACC specification
-
by adding new types in the following way
%union { (type fieldname) (type fieldname) ............... }
-
and defining which token and non-terminals will use these types
1 2
%token <fieldname> token %type <fieldname> non-terminal
-
-
in LEX specification by using the fieldnames in the assignment as follows
1
yylval.fieldname = ...........
-
If you need a record type, then add it in the union. Example:
|
|
- in the LEX specification use the record name and record field in assignments:
|
|
- in the YACC rules specification use the record field only in the assignment:
|
|
assuming that $1
has the appropriate type, whatever it denotes.
4 My Source code: Simple Parser using Lex and Yacc
4.1 How to run my code
- Keep the parser folder anywhere in your computer.
- Keep the Makefile.mk file in home.
- Edit the first line of the Makefile.mk from “cd parser” to “cd path_of_the_parser_folder”
- Go to command prompt and write ./Makefile.mk
4.2 Input
- Initially input.txt is given in the parser folder. You can customize the inputs.
- There is an additional input file named tricky_input.txt that contains some tricky cases.