You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Marvin Humphrey <ma...@rectangular.com> on 2012/07/18 17:53:03 UTC

[lucy-dev] Lemon and yacc

Greets,

Some followup from last night's Lucy Book Club...

The Lemon parser generator and the Unix utility yacc both support LALR(1)
grammars, but the input formats differ somewhat. Here are some of the
superficial disparities:

  * Lemon does not support the vertical bar for alternation within rules.
  * Lemon does not require pre-declaration of tokens.
  * Lemon files do not have two distinct sections separated by the `%%` token.
  * Lemon does not support single-quoted characters as terminals -- all
    terminals must be spelled out.

None of those differences constrain the kinds of constructs that can be parsed
-- they are merely different flavors of syntactic sugar.

Here's a minimal yacc grammar which is designed to illustrate the "dangling
else"[1] shift-reduce conflict:

    %token IF
    %token ELSE

    %%

    statement
        : ';'    /* empty statement */
        | IF statement
        | IF statement ELSE statement
        ;

Here's an equivalent grammar for Lemon:

    start ::= statement.
    statement ::= SEMICOLON.      /* Empty statement */
    statement ::= IF statement.
    statement ::= IF statement ELSE statement.

For curiosity's sake, here is an equivalent in the original BNF as used in the
Algol-60 paper[2]:

    <statement> ::= ; | if <statement> | if <statement> else <statement>

The verbose debugging output produced by the two parser generators is also
quite similar. Here's an excerpt from the "y.output" file produced by
`yacc -v grammar.y`...

    State 4 conflicts: 1 shift/reduce

    [...]

    state 4

        2 statement: IF statement .
        3          | IF statement . ELSE statement

        ELSE  shift, and go to state 6

        ELSE      [reduce using rule 2 (statement)]
        $default  reduce using rule 2 (statement)

... and here's an analogous excerpt from the "grammar.out" file produced by
`lemon grammar.y` (debugging is enabled by default in Lemon):

    State 3:
          (2) statement ::= IF statement *
              statement ::= IF statement * ELSE statement

                      ELSE shift  1
                      ELSE reduce 2   ** Shift-reduce parsing conflict **
                 {default} reduce 2

What both of these excerpts illustrate is that the parser is not sure what to
do when it has accepted "IF statement" and then encounters an "ELSE".  (The
grammar is ambiguous, as it can produce multiple derivations for "IF statement
ELSE IF statement ELSE statement" -- should the last ELSE clause be attached
to the inner IF or the outer IF?)  There are minor differences, such as `.`
vs. `*` for indicating the parser's location in the token stream, but the
meaning is effectively the same.

Marvin Humphrey

[1] http://en.wikipedia.org/wiki/Dangling_else
[2] http://www.masswerk.at/algol60/report.htm