The grammar is specified by first giving the list of terminals and the list of non-terminal definitions. Each non-terminal definition is a list where the first element is the non-terminal and the other elements are the right-hand sides (lists of grammar symbols). In addition to this, each rhs can be followed by a semantic action.
For example, consider the following (yacc) grammar for a very simple expression language:
e : e '+' t | e '-' t | t ; t : t '*' f : t '/' f | f ; f : ID ;
The same grammar, written for the scheme parser generator, would look like this (with semantic actions)
(define expr-parser (lalr-parser ; Terminal symbols (ID + - * /) ; Productions (e (e + t) : (+ $1 $3) (e - t) : (- $1 $3) (t) : $1) (t (t * f) : (* $1 $3) (t / f) : (/ $1 $3) (f) : $1) (f (ID) : $1)))
In semantic actions, the symbol $n
refers to the synthesized
attribute value of the nth symbol in the production. The value
associated with the non-terminal on the left is the result of
evaluating the semantic action (it defaults to #f
).
The above grammar implicitly handles operator precedences. It is also possible to explicitly assign precedences and associativity to terminal symbols and productions à la Yacc. Here is a modified (and augmented) version of the grammar:
(define expr-parser (lalr-parser ; Terminal symbols (ID (left: + -) (left: * /) (nonassoc: uminus)) (e (e + e) : (+ $1 $3) (e - e) : (- $1 $3) (e * e) : (* $1 $3) (e / e) : (/ $1 $3) (- e (prec: uminus)) : (- $2) (ID) : $1)))
The left:
directive is used to specify a set of
left-associative operators of the same precedence level, the
right:
directive for right-associative operators, and
nonassoc:
for operators that are not associative. Note
the use of the (apparently) useless terminal uminus.
It
is only defined in order to assign to the penultimate rule a
precedence level higher than that of *
and
/.
The prec:
directive can only appear as
the last element of a rule. Finally, note that precedence levels are
incremented from left to right, i.e. the precedence level of
+
and -
is less than the precedence level of
*
and /
since the formers appear first in
the list of terminal symbols (token definitions).
The following options are available.
output:
namefilename) - copies the parser to the given
file. The parser is given the name name. out-tables:
filename) - outputs the parsing tables in
filename in a more readable format expect:
n) - don't warn about conflicts if there are
n or less conflicts.lalr-scm
implements a very simple error recovery
strategy. A production can be of the form
(rulename ... (error TERMINAL) : action-code
(There can be several such productions for a single rulename.) This will cause the parser to skip all the tokens produced by the lexer that are different than the given TERMINAL. For a C-like language, one can synchronize on semicolons and closing curly brackets by writing error rules like these:
(stmt (expression SEMICOLON) : ... (LBRACKET stmt RBRACKET) : ... (error SEMICOLON) (error RBRACKET))
Conflicts in the grammar are handled in a conventional way. In the absence of precedence directives, Shift/Reduce conflicts are resolved by shifting, and Reduce/Reduce conflicts are resolved by choosing the rule listed first in the grammar definition.