The first phase of compilation is called scanning or lexical analysis. This phase interprets the input program as a sequence of characters and produces a sequence of tokens that will be used by the parser.
Write a C++ program that implements a scanner for a language whose tokens are defined below:
<Keyword> à if | then | else | begin | end | program
<Identifier> à <char> | <char> <identifier>
<Integer> à <digit> | <digit> <integer>
<Special> à ( | ) | [ | ] | + | - | = | , | ;
<Digit> à 0|1|2|3|4|5|6|7|8|9
<Char> à a|b|c|…|z|A|B|…|Z
The token classes that will be recognized are Keyword, Identifier, Integer, and Special. Tokens are separated by white spaces (blanks, newlines and tabs) and/or special characters. The language is NOT case sensitive (i.e., you could and probably should convert and store all the non-numeric tokens in lowercase).
You may assume that
· The input program is syntactically correct.
· There are fewer than 1000 distinct tokens.
· Each identifier has up to 15 characters.
· The long int data type of C++ is sufficient to represent any of the integers.
Your program should read the input from a file named “scan.in” and build a symbol table that contains an entry for each token that was found in the input. You may use any data structure for the symbol table (e.g., an array of struct) although compilers often use a hash table. After all the input have been read, your program should produce a summary report in a file named “scan.out” that includes a list of the tokens that appeared in the input, the number of times each token appears in the input and the classification of each token. The last line of the output file should print the sum of all integers in the table. (This is just to ensure that the integers are read and stored as integers in your program.)