TCLLk: An LL(k) parser generator
For the software and documentation, click
here.
For a paper (in Postscript): A Strong LL(k) Parser Generator That
Accepts Non-LL Grammars and Generates LL(1) Tables: Extended Abstract
LALR(1) parser generators, for example YACC and BISON, have been used in
compiler-writing for a number of years because of their generality. The class of grammars
they accept is adequately large to handle most programming languages without requiring
much expertise on the part of the compiler writer. But LALR(1) parsing tables tend to be
large and their error recovery facilities, poor.
I have recently developed an LL(k) parsing algorithm that promises to make LALR(1)
obsolete. It has the following characteristics:
- It seems about as general as LALR(1), accepting programming language grammars with few
changes.
- It has the excellent possibilities of error recovery common to LL parsing algorithms.
- It may produce smaller tables than LALR(1), although this has yet to be determined for a
reasonably large test set of programming language grammars.
The current version of the LL(k) parser generator is written in Icon. It's algorithm in
very general terms is:
- Analyze the grammar and stop if it is not reduced.
- Check for and remove left recursion.
- Factor. Do both deep and shallow factoring:
- Shallow factoring consolidates several productions that begin with the same sequence of
symbols.
- Deep factoring splits productions by replacing initial nonterminals with their
right-hand sides in hopes of eventually generating right hand sides that can be
(shallowly) factored. Unfortunately, deep factoring may produce more productions that have
to be deeply factored. The algorithm for factoring could be used to detect whether a
context-free grammar is ambiguous, which is not computationally possible. The algorithm,
therefore, will stop trying after a while if it hasn't succeeded.
- Try to build an LL(1) parser for the rewritten grammar. If that doesn't work, the parser
generator will have to resort to one of two expedients:
- If this is a case of a dangling tail (e.g. a "dangling-else"), the parser will
just use the non-empty-deriving alternative.
- Otherwise, build look-ahead trees out of productions. When it has found the correct
right hand side (RHS) to use, the parser will back up and try that RHS. If it can't
determine which RHS to use in k or fewer levels of look-ahead tree, the grammar is not
LL(k).