--accepting rule #n
Rules are numbered sequentially with the first one being 1. Rule #0
is executed when the scanner backtracks; Rule #(n+1) (where
n
is the number of rules) indicates the default action; Rule #(n+2) indicates
that the input buffer is empty and needs to be refilled and then the scan
restarted. Rules beyond (n+2) are end-of-file actions.| Figure 3: Example of File Containing Lexical Analyzer |
" \ { } [ ] ^ $ < > ? . * + | ( ) /
The meaning of each operator is summarized below:
1234¯1234¯1234¯1234¯1234¯1234
x -- the character "x" Rules Interpretations ----- --------------- a or "a" The character a Begin or "Begin" The string Begin \"Begin\" The string "Begin" ^\t or ^"\t" The tab character \t at the beginning of line. \n$ The newline character \n at the end of line.There are a few special characters which can be specified in a regular expression: 1234¯1234¯1234¯1234¯1234¯1234 \n -- newline
Rules Interpretations ----- --------------- [^abc] Any character except a, b, or c. [abc] The single character a, b, or c. [-+0-9] The - or + sign or any digit from 0 to 9. [\t\n\b] The tab, newline, or backspace character.
Rules Interpretations
----- ---------------
ab?c Matches either abc or ac.
ab.c Matches all strings of length 4 having a, b and
c as the first, second and fourth letter where the
third character is not a newline.
Rules Interpretations
----- ---------------
[a-z]+ Matches all strings of lower case letters.
[A-Za-z][A-Za-z0-9]* Indicates all alphanumeric strings with a
leading alphabetic character.
Rules Interpretations
----- ---------------
ab|cd Matches either ab or cd.
(ab|cd+)?(ef)* Matches such strings as abefef, efefef, cdef,
or cddd; but not abc, abcd, or abcdef.
Rules Interpretations ----- --------------- ^ab Matches ab at the beginning of line. ab$ Matches ab at the end of line.
Rules Interpretations
----- ---------------
{INTEGER} If INTEGER is defined in the macro definition
section, then it will be expanded here.
definitions section
%%
rules section
%%
user defined section
\#\#
user defined section
where %% is used as a delimiter between sections and ##
indicates where function yylex will be placed. Both %%
and ## must occur in column one.
name expression
where name must begin with a letter and contain only letters,
digits and underscores, and expression is
any string of characters that name will be textually substituted to
if found in the rule section. At least one space must separate name
from expression in the definition. No syntax checking is done in
the expression, instead the whole rule is parsed after expansion.
The macro facility is very useful in writing regular expressions which
have common substrings, and in defining often-used ranges like digit
and letter.
Perhaps its best advantage is to give a mnemonic name to a rather strange
regular expression - making it easier for the programmer to debug the
expressions. These macros, once defined, can be used in the
regular expression by surrounding them with { and }, e.g., {DIGIT}.
For example, the rule
[a-zA-Z]([0-9a-zA-Z])* {put_line ("Found an identifier");}
[0-9]+ {put_line ("Found a number");}
defines identifiers and integer numbers. With macros, the source file is
LETTER [a-zA-Z]
DIGIT [0-9]
%%
{LETTER}({DIGIT}|{LETTER})* {put_line ("Found an identifier");}
{DIGIT}+ {put_line ("Found a number");}
%Start cond1 cond2 ...
where cond1 and cond2 indicate start conditions.
Note that %Start may be abbreviated as %S or %s.
ENTER(cond1);
Aflex also provides exclusive start conditions. These are
similar to normal start conditions except they have the property that
when they are active no other rules are active. Exclusive start
conditions are declared and used like normal start conditions except
that the declaration is done with %x instead of %s.
pattern {action}
where pattern is a regular expression and action is an Ada
code fragment enclosed between { and }. A pattern must
always begin in column one.
%%
begin|BEGIN {copy (yytext, buffer);
Install (yytext,symbol_table);
return RESERVED;}
recognizes the reserved word "begin" or "BEGIN", copies the
token string into the buffer, inserts it in the symbol table and returns
the value, RESERVED.
Note that the user must provide the procedures
copy and install along with all necessary types and variables
in the user defined section.
return (token_val);
to return the appropriate token value. Ayacc creates a package
defining this token type from its specification file, which in turn
should be with'ed at the beginning of the user defined section.
Thus, this token package must be compiled before the lexical analyzer.
The user is encouraged to read the Ayacc User Manual [] for
more information on the interaction between aflex and ayacc.
LOWER [a-z]
UPPER [A-Z]
%%
{LOWER}+ { Lower_Case := Lower_Case + 1;
TEXT_IO.PUT(To_Upper_Case(Example_DFA.YYText)); }
-- convert all alphabetic words in lower case
-- to upper case
{UPPER}+ { Upper_Case := Upper_Case + 1;
TEXT_IO.PUT(Example_DFA.YYText); }
-- write uppercase word as is
\n { TEXT_IO.NEW_LINE;}
. { TEXT_IO.PUT(Example_DFA.YYText); }
-- write anything else as is
%%
with U_Env; -- VADS environment package for UNIX
procedure Example is
type Token is (End_of_Input, Error);
Tok : Token;
Lower_Case : NATURAL := 0; -- frequency of lower case words
Upper_Case : NATURAL := 0; -- frequency of upper case words
function To_Upper_Case (Word : STRING) return STRING is
Temp : STRING(1..Word'LENGTH);
begin
for i in 1.. Word'LENGTH loop
Temp(i) := CHARACTER'VAL(CHARACTER'POS(Word(i)) - 32);
end loop;
return Temp;
end To_Upper_Case;
-- function YYlex will go here!!
##
begin -- Example
Example_IO.Open_Input (U_Env.argv(1).s);
Read_Input :
loop
Tok := YYLex;
exit Read_Input
when Tok = End_of_Input;
end loop Read_Input;
TEXT_IO.NEW_LINE;
TEXT_IO.PUT_LINE("Number of lowercase words is => " &
INTEGER'IMAGE(Lower_Case));
TEXT_IO.PUT_LINE("Number of uppercase words is => " &
INTEGER'IMAGE(Upper_Case));
end Example;
This source file is run through aflex using the command
% aflex example.laflex produces an output file called example.a along with two packages, example_dfa.a and example_io.a. Assuming that the main procedure, Example, is used to construct an object file called example.out, the Unix command
% example.out example.lprints to the screen the exact file example.l with letters in uppercase, i.e. the output to the screen is
LOWER [A-Z]
UPPER [A-Z]
%%
{LOWER}+ { LOWER_CASE := LOWER_CASE + 1;
TEXT_IO.PUT(TO_UPPER_CASE(EXAMPLE_DFA.YYTEXT)); }
-- CONVERT ALL ALPHABETIC WORDS IN LOWER CASE
-- TO UPPER CASE
{UPPER}+ { UPPER_CASE := UPPER_CASE + 1;
TEXT_IO.PUT(EXAMPLE_DFA.YYTEXT); }
-- WRITE UPPERCASE WORD AS IS
\N { TEXT_IO.NEW_LINE;}
. { TEXT_IO.PUT(EXAMPLE_DFA.YYTEXT); }
-- WRITE ANYTHING ELSE AS IS
%%
WITH U_ENV; -- VADS ENVIRONMENT PACKAGE FOR UNIX
PROCEDURE EXAMPLE IS
TYPE TOKEN IS (END_OF_INPUT, ERROR);
TOK : TOKEN;
LOWER_CASE : NATURAL := 0; -- FREQUENCY OF LOWER CASE WORDS
UPPER_CASE : NATURAL := 0; -- FREQUENCY OF UPPER CASE WORDS
FUNCTION TO_UPPER_CASE (WORD : STRING) RETURN STRING IS
TEMP : STRING(1..WORD'LENGTH);
BEGIN
FOR I IN 1.. WORD'LENGTH LOOP
TEMP(I) := CHARACTER'VAL(CHARACTER'POS(WORD(I)) - 32);
END LOOP;
RETURN TEMP;
END TO_UPPER_CASE;
-- FUNCTION YYLEX WILL GO HERE!!
##
BEGIN -- EXAMPLE
EXAMPLE_IO.OPEN_INPUT (U_ENV.ARGV(1).S);
READ_INPUT :
LOOP
TOK := YYLEX;
EXIT READ_INPUT
WHEN TOK = END_OF_INPUT;
END LOOP READ_INPUT;
TEXT_IO.NEW_LINE;
TEXT_IO.PUT_LINE("NUMBER OF LOWERCASE WORDS IS => " &
INTEGER'IMAGE(LOWER_CASE));
TEXT_IO.PUT_LINE("NUMBER OF UPPERCASE WORDS IS => " &
INTEGER'IMAGE(UPPER_CASE));
END EXAMPLE;
Number of lowercase words is => 144
Number of uppercase words is => 120
definitions section
%%
rules section
%%
user defined section
## -- places yylex function
user defined section
Create_Output("/dev/tty");
be made. This will still work but because of differences in
implementation this may cause difficulties in redirecting output using
the unix shell pipes and redirection. Instead just don't call
Open_Input and output will go to the default standard_output.
{DIG} [0-9] -- a digit
In which the pushed-back text is "([0-9] - a digit)".