For Programmers: Free Programming Magazines  


Home > Archive > Compilers > July 2004 > Usefulness of automatic parse tree generation









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Usefulness of automatic parse tree generation
Dave

2004-07-13, 4:01 pm

Hello all,

I'd like to solicit opinions on a particular idea from people with
experience in the compilers field.

I'm in the process of writing a generic SLR parser. It takes as input
SLR parsing tables and a string to be parsed. Currently, it's output
is the sequence of rule invocations that will yield the input string
(or an error if the string is not in the language).

The question I have is with regard to the output. I'm considering
also having it output a parse tree. Of course, as a generic library,
the only thing it could do would be to output a parse tree that
exactly mimicked the grammar and the particular derivation arrived at.
So, leaves would represent tokens and interior vertices and the edges
coming from them would represent a grammar rule invocation.

Often times, people want a parse tree that is more compact than that
described above. For example, parsing the expression "5 + 5" might
yield a tree with three vertices: a root node representing "+" with
two children each representing 5. My approach would yield a much
larger tree if the grammar had many levels getting down to a number
token. And this would typically be the case in a grammar representing
expressions with operators at many different precedence levels...

So, my question is: How useful would my idea be? In what contexts
would it be useful to have a tree that mimicked the grammar and a
particular derivation with that grammar? In what contexts would it
not be useful? Is it worth implementing?

I appreciate any input.

Thanks,
Dave
Ira Baxter

2004-07-14, 3:57 am

"Dave" <better_cs_now@yahoo.com> wrote in message

> I'd like to solicit opinions on a particular idea from people with
> experience in the compilers field.
>
> I'm in the process of writing a generic SLR parser. ...
>
> The question I have is with regard to the output. I'm considering
> also having it output a parse tree. Of course, as a generic library,
> the only thing it could do would be to output a parse tree that
> exactly mimicked the grammar and the particular derivation arrived at.


The DMS Software Reengineering Toolkit does exactly this. It also has
options to build more compressed trees by removing unary productions
and eliminating non-value-bearing terminals. We use the compressed
versions to help us efficiently process very big sets of source files.

At the physical tree level, using the procedural tree interface, these
differences are visible, and it can be problematic to write code to
handle the various representations. For many tasks (especially
information extraction tasks) this really isn't an issue; having a
"GetNthChild" call is useful simply to walk the tree when you are
looking for something, and unary productions and terminals aren't
generally very interesting.

The attribute evaluator and source-to-source rewrite aspects of DMS,
however, offer a compression-independent view of the trees. This is
really nice; you can have compact trees and
easily derived representation which is useful.

DMS has been used for a variety of software analysis, mass change, and
transformation tasks. So I think it is reasonable to say this is a
useful idea :-)
--
Ira D. Baxter, Ph.D., CTO 512-250-1018
Semantic Designs, Inc. www.semdesigns.com
Carl Cerecke

2004-07-14, 3:57 am

Dave wrote:

> I'm in the process of writing a generic SLR parser. It takes as input
> SLR parsing tables and a string to be parsed.


If its input is parse tables, then there is very little work to make it
LALR or full LR(1) as well. The big difference between SLR, LR, and LALR
is in the table generation, not the actual parsing machinery.

> So, my question is: How useful would my idea be? In what contexts
> would it be useful to have a tree that mimicked the grammar and a
> particular derivation with that grammar?


Useful for educational purposes. And, perhaps, small simple grammars.

> In what contexts would it not be useful?


Anywhere were a sensible syntax-tree is preferred (e.g.
compiler/translator). Because, I am assuming, the parse tables that
are your parser's input don't have abstract syntax tree (AST)
directions, you'd be stuck with massive trees.

I wrote a parser-generator once that automatically created ASTs, and
struck this very problem. I had to add AST directions to the
parser-generator input file to get a sensible sized-tree that I could
walk over in a reasonable time. It was a huge language though -
produced a parsing automata 4 times the size of Java.

> Is it worth implementing?


Sure. Go for it.

Cheers,
Carl.
Dave

2004-07-17, 8:58 pm

"Carl Cerecke" <cdc@maxnet.co.nz> wrote in message

>
> If its input is parse tables, then there is very little work to make it
> LALR or full LR(1) as well. The big difference between SLR, LR, and LALR
> is in the table generation, not the actual parsing machinery.
>
>
> Useful for educational purposes. And, perhaps, small simple grammars.
>
>
> Anywhere were a sensible syntax-tree is preferred (e.g.
> compiler/translator). Because, I am assuming, the parse tables that
> are your parser's input don't have abstract syntax tree (AST)
> directions, you'd be stuck with massive trees.
>
> I wrote a parser-generator once that automatically created ASTs, and
> struck this very problem. I had to add AST directions to the
> parser-generator input file to get a sensible sized-tree that I could
> walk over in a reasonable time. It was a huge language though -
> produced a parsing automata 4 times the size of Java.
>
>
> Sure. Go for it.
>
> Cheers,
> Carl.


I've thought about the LALR thing you mentioned. I'm also generating
the parse tables (separate library from the parser), so that's where
the change would need to be. How difficult is it to upgrade an SLR
table generator to an LALR table generator?

I've also been thinking more about the parse tree vs. syntax tree
thing. I would like to automate the transformation of a parse tree
into a syntax tree. But it's not clear to me if this can be done by
simply collapsing long chains of unit productions or not. There might
be more to it to generate a minimal and completely natural syntax
tree. It may be that the user of the library needs to provide rules
for the transformation. If so, what would these rules look like? I'd
love to hear from anybody that has experience in doing this and can
offer sound guidance!

Thanks,
Dave
[I think you need explicit hints in the grammar to produce an AST, so
you can tell the difference between, say, (exp) which is just grouping
and [exp] which is a unary operator. -John]

Paul B Mann

2004-07-28, 9:08 pm

> How difficult is it to upgrade an SLR table generator to an LALR
> table generator?


Difficult. There are pitfalls such as the NQLALR (not quite LALR)
problem.

> It may be that the user of the library needs to provide rules for
> the transformation.


Yes, the user provides node names, at least.

> If so, what would these rules look like? I'd love to hear from
> anybody that has experience in doing this and can offer sound
> guidance! >


I did it this way in a commercial PG I wrote. The AST turns out to be
what you want.

// Operator precedence goes here

// Grammar:

Goal -> Stmt... <eof>

Stmt -> Exp ';'
-> Target '=' Exp ';' +> assign

Target -> <identifier> +> target

Exp -> Primary
-> Exp '+' Exp +> add
-> Exp '-' Exp +> sub
-> Exp '*' Exp +> mul
-> Exp '/' Exp +> div

Primary -> <identifier> +> ident
-> '(' Exp ')'
-> '+' Primary
-> '-' Primary +> neg
-> '@' Primary +> addr

// End.

Paul B Mann
paulbmann@yahoo.com
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2010 codecomments.com