For Programmers: Free Programming Magazines  


Home > Archive > Compilers > April 2006 > Object-oriented AST









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Object-oriented AST
Thomas Weiss

2006-04-08, 7:03 pm

Hello,

Absolute beginner in compiler construction, I am looking for an
efficient way to design abstract syntax trees in object-oriented
languages. The interpreter design pattern tells us that the BNF rule
a : b | c can lead to the following design :
class a {}
class b : a {}
class c : a {}

That's fine, b & c are indeed specialized versions of a. But BNF
grammars are rarely simple. Especially, how should be designed an AST
reflecting the following rules :
a : b | c
d : e | c
?

Thank you for your help,

Thomas
Florian Liekweg

2006-04-09, 7:06 pm

Thomas Weiss wrote:
> Hello,
>
> [...] I am looking for an
> efficient way to design abstract syntax trees in object-oriented
> languages.


Hello, Thomas,

From your example, it seems that you are trying to turn every BNF rule
into an inheritance relationship. I would advise against doing this
in this strictness. It seems more sensible to me to assemble the
important parts of the AST (i. e. the nodes that match class decls,
methods or fields ... control flow statements (if, for, while),
statements and expressions) and create an inheritance hierarchy
over that dimension. You could have a superclass 'Statement' with
subclasses like 'ForLoop', 'IfBranch', 'WhileLoop' etc, and You
could have 'Expression' as a superclass of 'UnaryExpression' and
'BinaryExpression'. For any BNF-Rule, e. g. "A -> B | C", where
'A' stands for an item which is of importance in the AST, I'd rather
expect the 'A' node to be able to accomodate either a 'B' or a 'C'
as an attribute, not as a subclass. For non-terminals on the BNF side
which exist for technical reasons, you might not want to have an
AST node at all. It might be helpful to grab an open-source compiler
which is implemented in some OO language, and study how it does
this part. Besides other examples, there's kjc, the kaffee java
compiler.

HTH, yours,
FLorian
[I agree, it's OK to make each rule into a subtree, but overkill to
try to make each one into a class. -John]


Chris F Clark

2006-04-12, 10:02 pm

Thomas Weiss wrote:
> I am looking for an
> efficient way to design abstract syntax trees in object-oriented
> languages.

.... [example showing inheirtance matching parsing stratgy snipped]

This method was been followed [with some success I presume] in the
Cocktail toolkit by Josef Gro\:lsh. Note, that your issue of some
rule descending from two parents and thus inheriting from both isn't
really as frequent as you would think. In reality, the productions
that get resused in multiple productions are natural roots of trees,
e.g. "statement", "expression". That is, you may see expression used
in multiple contexts, but not and-expression, or-expression,
plus-expression and the rest, those will only occur inside expression.
Therefore, if you find places where there are two references to the
same non-terminal, that non-terminal is a likely base class. The same
thing helps one break cycles, since expression is usually
self-recursive. Once one has made a non-terminal into a base class,
one doesn't make it into a derived class of itself.

Florian Liekweg replied:
> From your example, it seems that you are trying to turn every BNF rule
> into an inheritance relationship. I would advise against doing this
> in this strictness. It seems more sensible to me to assemble the
> important parts of the AST (i. e. the nodes that match class decls,
> methods or fields ... control flow statements (if, for, while),
> statements and expressions) and create an inheritance hierarchy
> over that dimension....


There is lots of sense to this approach (making your hierarchy
semantic not syntactic), and we have used it in Yacc++. If you
divorce your inheritance hierarchy from the syntax, you will find that
it is much flatter, and only captures those distinctions you want
preserved for semantic reasons. Doing so makes it much more likely,
that you can put member functions in base classes and have them
actually apply to the inherited classes.

In particular, a common pattern that I have seen replicated many times
is three essential base classes:

class ast-base // everything inherits from this
// and this is what pointers, point to

class tk-base is-a ast-base // tokens derive from this
class nt-base is-a ast-base // non-terminals derive from this

From, this point on one can make sub-divisions that are appropriate to
your application. I find there are generally at least two sub-classes
of tk-base.

class tk-sym-base is-a tk-base // tokens with "symbols" for variable spellings
// derive from this
class tk-fixed-base is-a tk-base // tokens with fixed spellings
// derive from this

As Florian suggested, you will find some likely base classes for
non-terminals to be expressions and statements.

class nt-stmt-base is-a nt-base // non-terminals representing statements
// derive from this
class nt-expr-base is-a nt-base // non-terminals representing expressions
// derive from this

I also find, that it is often useful to sub-divide the expression (or
statement) classes even more. The clases below are typical and come
from a Verilog compiler I did a few years back--the compiler has more
than these, but these give you the flavor of what I did.

class nt-expr-binary-base is-a nt-expr-base
// non-terminals representing binary expressions
// derive from this (e.g. + *)

class nt-expr-unary-base is-a nt-expr-base
// non-terminals representing unary expressions
// derive from this (e.g. unary -, ~)

class nt-expr-binary-relation-base is-a nt-expr-binary-base
// non-terminals representing relational binary expressions
// derive from this (e.g. < <= == != > >=)
class nt-expr-binary-logic-base is-a nt-expr-binary-base
// non-terminals representing logical binary expressions
// derive from this (e.g. && ||)
class nt-expr-binary-bitwise-base is-a nt-expr-binary-base
// non-terminals representing bit-wise binary expressions
// derive from this (e.g. & |)

Note that, at times, I have found it very useful to have a distinct
class for every different syntactic rule. That way in the class, I
know exactly what fields are well-defined. At other times, I have
found it conventient to group two distinct alternatives into the same
class as they have the same semantics. However, I never have a deep
inheritance heirarchy that matches the grammar. I may have a deep
inheritance hierarchy the represents my semantics so that I can
code-share exactly the right functions and define them only once, but
that's a different story.

Hope this helps,
-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
Thomas Weiss

2006-04-16, 8:05 am

Thank very much for your answers. Be sure these help me a lot.

In fact, I'm trying to design an AST for an ASN.1 compiler, and rules
descending from multiple parents appear often in the ASN.1 grammar.

I will try to post here the results of my work, as soon as I reach
something consistent and interesting.

Thomas

"Chris F Clark" <cfc@shell01.TheWorld.com> a écrit dans le message de news:
> Thomas Weiss wrote:
> ... [example showing inheirtance matching parsing stratgy snipped]
>
> This method was been followed [with some success I presume] in the
> Cocktail toolkit by Josef Gro\:lsh. ...

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com