For Programmers: Free Programming Magazines  


Home > Archive > Compilers > May 2005 > JavaCUP and actions









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author JavaCUP and actions
Svend Tofte

2005-04-27, 3:59 am

Hello,

I'm currently writing a parser, which will do an online parse of the
input language. That is, emit the translated code (XML) as soon as
possible. If you use a top-down approach, this is simple enough.

However, we'd like to use a bottom-up parser, simply because we've had
experience with one such parser (JavaCUP), and many languages seem LR
grammar styled.

Given an ordinary expression grammar, for example:

E ::= E PLUS:p T | T ;
T ::= T TIMES:t F | F ;
F ::= LPAREN:l E RPAREN:r | NUMBER:n ;

JavaCUP eats it up fine. However, if I proceed to add actions to the
first production, such as:

E ::= {: System.out.print("<E>"); :}
E
PLUS:p
{: System.out.print(p); :}
T
{: System.out.print("</E>"); :}

Then I get some six shift/reduce errors. Which on one hand seems fine
(how can it know, so early in the parse, that this production is it).

But ... reading in the dragon book, and taking in what my teacher told
me, and reading the JavaCUP manual (under "The Grammar"), I feel that
this should be possible. That JavaCUP would merely "delay", the
execution of actions, that it could not yet guarentee was to be
executed.

However, this doesn't seem to be the case.

So I'd like to know, if what I want is possible at all (it seems like
it should be), and if so, does JavaCUP do this? If not, is there
another LALR parser generator which does this?

And if no to all those, I will have to use a top-down approach
naturally.

Regards,
Svend
[Adding interior actions changes the grammar. When you write this (using
yacc syntax which I know better):

a: b { blah } c { farp } d ;

it is treated like this

a: b xx1 c xx2 d;

xx1: /* empty */ { blah };
xx2: /* empty */ { farp };

In the absence of the embedded rules, the parser can defer deciding whether
it's seen an "a" rule until it parses the "d", but with embedded rules, it
has to decide as soon as it's seen the "b".

The solution is to put all your debug stuff at the end of the rule. -John]
Chris Dodd

2005-05-01, 9:12 pm

[posted and mailed]

"Svend Tofte" <stofte@gmail.com> wrote in news:05-04-069@comp.compilers:
> Given an ordinary expression grammar, for example:
>
> E ::= E PLUS:p T | T ;
> T ::= T TIMES:t F | F ;
> F ::= LPAREN:l E RPAREN:r | NUMBER:n ;
>
> JavaCUP eats it up fine. However, if I proceed to add actions to the
> first production, such as:
>
> E ::= {: System.out.print("<E>"); :}
> E
> PLUS:p
> {: System.out.print(p); :}
> T
> {: System.out.print("</E>"); :}
>
> Then I get some six shift/reduce errors. Which on one hand seems fine
> (how can it know, so early in the parse, that this production is it).


As the moderator notes, the problem is that embedded actions introduce
extra rules, and those extra rules introduce conflicts. What's
particularly annoying here is that there generally don't need to be
conflicts -- the conflicts tend to come from identical embedded
actions. So if the parser would recognize the actions are identical,
and use the same extra symbol for them, it would be ok. Fortunately,
you can get around this by adding your own fake symbols.

For example, from your example you probably have:

E ::= {: System.out.print("<E>"); :}
E
PLUS:p
{: System.out.print(p); :}
T
{: System.out.print("</E>"); :}
| {: System.out.print("<E>"); :}
T
{: System.out.print("</E>"); :}
;

Which gives you a reduce/reduce conflict on the two extra symbols
introduced for the "{: System.out.print("<E>"); :}" rules. If you
instead do it yourself with a single rule:

E ::= _begin_E_
E
PLUS:p
{: System.out.print(p); :}
T
{: System.out.print("</E>"); :}
| _begin_E_
T
{: System.out.print("</E>"); :}
;
_begin_E_ ::= {: System.out.print("<E>"); :} ;

the conflict goes away...

Alternatively, you can use a parser generator such as btyacc which
automatically combines such identical actions.

Chris Dodd
cdodd@acm.org
Chris F Clark

2005-05-01, 9:12 pm

Svend Tofte asked an excellent question about delaying actions, with
the following grammar fragment:

E ::= {: System.out.print("<E>"); :}
E
PLUS:p
{: System.out.print(p); :}
T
{: System.out.print("</E>"); :}

It is possible to delay the actions, but it is actually quite tricky
to do so, and your grammar fragment illustrates why. When parsing an
E, the grammar wants to immediately print out a message "<E>".
However, note that it is going to immediately make a left-recursive
call to the E rule. So, it may need to print out another "<E>" also.
In fact, it may need to print out an unbounded number of "<E>"s at the
start of an E rule, as many as it needs to dive into nested E rules to
satisfy the left recursion.

BTW, this is why the grammar isn't LL(k). An LL parser generator,
needs to know by looking at the first k tokens of a rule whether it
applies or not. It cannot left recurse, because if it did, you get
this unbounded guessing problem.

In contrast LR parser generators, push the necessary information onto
the stack, and ignore the left recursion issue, knowing that some k
number of tokens after the recursive rule has completed, the parser
can figure out whether it recursed or not. However, any actions that
occur before the end of the rule, force the parser generator to decide
early and may cause conflicts and generally will if the rule is left
recursive. (See the moderators note about the typical implementation
strategy. It isn't the only one, but all strategies have roughly
equivalent issues.)

Hence, the moderator's suggestion, put any actions at the end of the
rule. This allows the LR parser generator to avoid the conflicts.
and an even better strategy is to build an AST and do the actions as a
walk of the AST. It moves your actions from being online to offline,
but it makes many things much simpler.

If you really need an online parse, you actually might be happier with
a top-down (LL) tool. The strictures of LL require you to write your
grammar such that it can be processed online. If you use an LR tool,
the advantage in left recursion (and thus more natural to my mind
grammars) is inherently due to its offline nature. Any place an LR
grammar is not also LL, is a place where there is potentially the
ability to look past an unbounded number of tokens before knowing what
to parse, which is essentially a requirement to do the parse offline.

Hope this helps,
-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com