Code Comments
Programming Forum and web based access to our favorite programming groups.Hello all, I'd like to solicit opinions on a particular idea from people with experience in the compilers field. I'm in the process of writing a generic SLR parser. It takes as input SLR parsing tables and a string to be parsed. Currently, it's output is the sequence of rule invocations that will yield the input string (or an error if the string is not in the language). The question I have is with regard to the output. I'm considering also having it output a parse tree. Of course, as a generic library, the only thing it could do would be to output a parse tree that exactly mimicked the grammar and the particular derivation arrived at. So, leaves would represent tokens and interior vertices and the edges coming from them would represent a grammar rule invocation. Often times, people want a parse tree that is more compact than that described above. For example, parsing the expression "5 + 5" might yield a tree with three vertices: a root node representing "+" with two children each representing 5. My approach would yield a much larger tree if the grammar had many levels getting down to a number token. And this would typically be the case in a grammar representing expressions with operators at many different precedence levels... So, my question is: How useful would my idea be? In what contexts would it be useful to have a tree that mimicked the grammar and a particular derivation with that grammar? In what contexts would it not be useful? Is it worth implementing? I appreciate any input. Thanks, Dave
Post Follow-up to this message"Dave" <better_cs_now@yahoo.com> wrote in message > I'd like to solicit opinions on a particular idea from people with > experience in the compilers field. > > I'm in the process of writing a generic SLR parser. ... > > The question I have is with regard to the output. I'm considering > also having it output a parse tree. Of course, as a generic library, > the only thing it could do would be to output a parse tree that > exactly mimicked the grammar and the particular derivation arrived at. The DMS Software Reengineering Toolkit does exactly this. It also has options to build more compressed trees by removing unary productions and eliminating non-value-bearing terminals. We use the compressed versions to help us efficiently process very big sets of source files. At the physical tree level, using the procedural tree interface, these differences are visible, and it can be problematic to write code to handle the various representations. For many tasks (especially information extraction tasks) this really isn't an issue; having a "GetNthChild" call is useful simply to walk the tree when you are looking for something, and unary productions and terminals aren't generally very interesting. The attribute evaluator and source-to-source rewrite aspects of DMS, however, offer a compression-independent view of the trees. This is really nice; you can have compact trees and easily derived representation which is useful. DMS has been used for a variety of software analysis, mass change, and transformation tasks. So I think it is reasonable to say this is a useful idea :-) -- Ira D. Baxter, Ph.D., CTO 512-250-1018 Semantic Designs, Inc. www.semdesigns.com
Post Follow-up to this messageDave wrote: > I'm in the process of writing a generic SLR parser. It takes as input > SLR parsing tables and a string to be parsed. If its input is parse tables, then there is very little work to make it LALR or full LR(1) as well. The big difference between SLR, LR, and LALR is in the table generation, not the actual parsing machinery. > So, my question is: How useful would my idea be? In what contexts > would it be useful to have a tree that mimicked the grammar and a > particular derivation with that grammar? Useful for educational purposes. And, perhaps, small simple grammars. > In what contexts would it not be useful? Anywhere were a sensible syntax-tree is preferred (e.g. compiler/translator). Because, I am assuming, the parse tables that are your parser's input don't have abstract syntax tree (AST) directions, you'd be stuck with massive trees. I wrote a parser-generator once that automatically created ASTs, and struck this very problem. I had to add AST directions to the parser-generator input file to get a sensible sized-tree that I could walk over in a reasonable time. It was a huge language though - produced a parsing automata 4 times the size of Java. > Is it worth implementing? Sure. Go for it. Cheers, Carl.
Post Follow-up to this message"Carl Cerecke" <cdc@maxnet.co.nz> wrote in message > > If its input is parse tables, then there is very little work to make it > LALR or full LR(1) as well. The big difference between SLR, LR, and LALR > is in the table generation, not the actual parsing machinery. > > > Useful for educational purposes. And, perhaps, small simple grammars. > > > Anywhere were a sensible syntax-tree is preferred (e.g. > compiler/translator). Because, I am assuming, the parse tables that > are your parser's input don't have abstract syntax tree (AST) > directions, you'd be stuck with massive trees. > > I wrote a parser-generator once that automatically created ASTs, and > struck this very problem. I had to add AST directions to the > parser-generator input file to get a sensible sized-tree that I could > walk over in a reasonable time. It was a huge language though - > produced a parsing automata 4 times the size of Java. > > > Sure. Go for it. > > Cheers, > Carl. I've thought about the LALR thing you mentioned. I'm also generating the parse tables (separate library from the parser), so that's where the change would need to be. How difficult is it to upgrade an SLR table generator to an LALR table generator? I've also been thinking more about the parse tree vs. syntax tree thing. I would like to automate the transformation of a parse tree into a syntax tree. But it's not clear to me if this can be done by simply collapsing long chains of unit productions or not. There might be more to it to generate a minimal and completely natural syntax tree. It may be that the user of the library needs to provide rules for the transformation. If so, what would these rules look like? I'd love to hear from anybody that has experience in doing this and can offer sound guidance! Thanks, Dave [I think you need explicit hints in the grammar to produce an AST, so you can tell the difference between, say, (exp) which is just grouping and [exp] which is a unary operator. -John]
Post Follow-up to this message> How difficult is it to upgrade an SLR table generator to an LALR
> table generator?
Difficult. There are pitfalls such as the NQLALR (not quite LALR)
problem.
> It may be that the user of the library needs to provide rules for
> the transformation.
Yes, the user provides node names, at least.
> If so, what would these rules look like? I'd love to hear from
> anybody that has experience in doing this and can offer sound
> guidance! >
I did it this way in a commercial PG I wrote. The AST turns out to be
what you want.
// Operator precedence goes here
// Grammar:
Goal -> Stmt... <eof>
Stmt -> Exp ';'
-> Target '=' Exp ';' +> assign
Target -> <identifier> +> target
Exp -> Primary
-> Exp '+' Exp +> add
-> Exp '-' Exp +> sub
-> Exp '*' Exp +> mul
-> Exp '/' Exp +> div
Primary -> <identifier> +> ident
-> '(' Exp ')'
-> '+' Primary
-> '-' Primary +> neg
-> '@' Primary +> addr
// End.
Paul B Mann
paulbmann@yahoo.com
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.