Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Usefulness of automatic parse tree generation
Hello all,

I'd like to solicit opinions on a particular idea from people with
experience in the compilers field.

I'm in the process of writing a generic SLR parser.  It takes as input
SLR parsing tables and a string to be parsed.  Currently, it's output
is the sequence of rule invocations that will yield the input string
(or an error if the string is not in the language).

The question I have is with regard to the output.  I'm considering
also having it output a parse tree.  Of course, as a generic library,
the only thing it could do would be to output a parse tree that
exactly mimicked the grammar and the particular derivation arrived at.
So, leaves would represent tokens and interior vertices and the edges
coming from them would represent a grammar rule invocation.

Often times, people want a parse tree that is more compact than that
described above.  For example, parsing the expression "5 + 5" might
yield a tree with three vertices: a root node representing "+" with
two children each representing 5.  My approach would yield a much
larger tree if the grammar had many levels getting down to a number
token.  And this would typically be the case in a grammar representing
expressions with operators at many different precedence levels...

So, my question is: How useful would my idea be?  In what contexts
would it be useful to have a tree that mimicked the grammar and a
particular derivation with that grammar?  In what contexts would it
not be useful?  Is it worth implementing?

I appreciate any input.

Thanks,
Dave

Report this thread to moderator Post Follow-up to this message
Old Post
Dave
07-13-04 09:01 PM


Re: Usefulness of automatic parse tree generation
"Dave" <better_cs_now@yahoo.com> wrote in message

> I'd like to solicit opinions on a particular idea from people with
> experience in the compilers field.
>
> I'm in the process of writing a generic SLR parser.  ...
>
> The question I have is with regard to the output.  I'm considering
> also having it output a parse tree.  Of course, as a generic library,
> the only thing it could do would be to output a parse tree that
> exactly mimicked the grammar and the particular derivation arrived at.

The DMS Software Reengineering Toolkit does exactly this.  It also has
options to build more compressed trees by removing unary productions
and eliminating non-value-bearing terminals.  We use the compressed
versions to help us efficiently process very big sets of source files.

At the physical tree level, using the procedural tree interface, these
differences are visible, and it can be problematic to write code to
handle the various representations.  For many tasks (especially
information extraction tasks) this really isn't an issue; having a
"GetNthChild" call is useful simply to walk the tree when you are
looking for something, and unary productions and terminals aren't
generally very interesting.

The attribute evaluator and source-to-source rewrite aspects of DMS,
however, offer a compression-independent view of the trees.  This is
really nice; you can have compact trees and
easily derived representation which is useful.

DMS has been used for a variety of software analysis, mass change, and
transformation tasks.  So I think it is reasonable to say this is a
useful idea :-)
--
Ira D. Baxter, Ph.D., CTO   512-250-1018
Semantic Designs, Inc.      www.semdesigns.com

Report this thread to moderator Post Follow-up to this message
Old Post
Ira Baxter
07-14-04 08:57 AM


Re: Usefulness of automatic parse tree generation
Dave wrote:

> I'm in the process of writing a generic SLR parser.  It takes as input
> SLR parsing tables and a string to be parsed.

If its input is parse tables, then there is very little work to make it
LALR or full LR(1) as well. The big difference between SLR, LR, and LALR
is in the table generation, not the actual parsing machinery.

> So, my question is: How useful would my idea be?  In what contexts
> would it be useful to have a tree that mimicked the grammar and a
> particular derivation with that grammar?

Useful for educational purposes. And, perhaps, small simple grammars.

> In what contexts would it not be useful?

Anywhere were a sensible syntax-tree is preferred (e.g.
compiler/translator). Because, I am assuming, the parse tables that
are your parser's input don't have abstract syntax tree (AST)
directions, you'd be stuck with massive trees.

I wrote a parser-generator once that automatically created ASTs, and
struck this very problem. I had to add AST directions to the
parser-generator input file to get a sensible sized-tree that I could
walk over in a reasonable time. It was a huge language though -
produced a parsing automata 4 times the size of Java.

> Is it worth implementing?

Sure. Go for it.

Cheers,
Carl.

Report this thread to moderator Post Follow-up to this message
Old Post
Carl Cerecke
07-14-04 08:57 AM


Re: Usefulness of automatic parse tree generation
"Carl Cerecke" <cdc@maxnet.co.nz> wrote in message
 
>
> If its input is parse tables, then there is very little work to make it
> LALR or full LR(1) as well. The big difference between SLR, LR, and LALR
> is in the table generation, not the actual parsing machinery.
> 
>
> Useful for educational purposes. And, perhaps, small simple grammars.
> 
>
> Anywhere were a sensible syntax-tree is preferred (e.g.
> compiler/translator). Because, I am assuming, the parse tables that
> are your parser's input don't have abstract syntax tree (AST)
> directions, you'd be stuck with massive trees.
>
> I wrote a parser-generator once that automatically created ASTs, and
> struck this very problem. I had to add AST directions to the
> parser-generator input file to get a sensible sized-tree that I could
> walk over in a reasonable time. It was a huge language though -
> produced a parsing automata 4 times the size of Java.
> 
>
> Sure. Go for it.
>
> Cheers,
> Carl.

I've thought about the LALR thing you mentioned.  I'm also generating
the parse tables (separate library from the parser), so that's where
the change would need to be.  How difficult is it to upgrade an SLR
table generator to an LALR table generator?

I've also been thinking more about the parse tree vs. syntax tree
thing.  I would like to automate the transformation of a parse tree
into a syntax tree.  But it's not clear to me if this can be done by
simply collapsing long chains of unit productions or not.  There might
be more to it to generate a minimal and completely natural syntax
tree.  It may be that the user of the library needs to provide rules
for the transformation.  If so, what would these rules look like?  I'd
love to hear from anybody that has experience in doing this and can
offer sound guidance!

Thanks,
Dave
[I think you need explicit hints in the grammar to produce an AST, so
you can tell the difference between, say, (exp) which is just grouping
and [exp] which is a unary operator. -John]


Report this thread to moderator Post Follow-up to this message
Old Post
Dave
07-18-04 01:58 AM


Re: Usefulness of automatic parse tree generation
> How difficult is it to upgrade an SLR table generator to an LALR
> table generator?

Difficult.  There are pitfalls such as the NQLALR (not quite LALR)
problem.

> It may be that the user of the library needs to provide rules for
> the transformation.

Yes, the user provides node names, at least.

> If so, what would these rules look like?  I'd love to hear from
> anybody that has experience in doing this and can offer sound
> guidance!  >

I did it this way in a commercial PG I wrote.  The AST turns out to be
what you want.

// Operator precedence goes here

// Grammar:

Goal    -> Stmt... <eof>

Stmt    -> Exp ';'
-> Target '=' Exp ';'      +> assign

Target  -> <identifier>            +> target

Exp     -> Primary
-> Exp '+' Exp             +> add
-> Exp '-' Exp             +> sub
-> Exp '*' Exp             +> mul
-> Exp '/' Exp             +> div

Primary -> <identifier>            +> ident
-> '(' Exp ')'
-> '+' Primary
-> '-' Primary             +> neg
-> '@' Primary             +> addr

// End.

Paul B Mann
paulbmann@yahoo.com

Report this thread to moderator Post Follow-up to this message
Old Post
Paul B Mann
07-29-04 02:08 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compilers archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 03:13 PM.

 

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.