Home > Archive > Compilers > November 2004 > ll(1) and variable assignment...
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
ll(1) and variable assignment...
|
|
|
| hi,
I want to write a recursive descent parser, and wonder if there is a
LL(1) grammar for "c-style" variable assignement
(i.e. "id=id=id+id" but not "id=id=id+id=2").
The following grammar is LL(2) I think:
A := ID = A
A := E
E := T E'
E':= + T E'
E':=
T := ID
T := ( A )
I tried to convert it to LL(1), then I got something like that:
A := ID A'
A := ( A )
A':= = A
A':= + T E'
E := T E'
E':= + T E'
E':=
T := ID
T := ( A )
But then I lose the context. I don't no if the "id" in rule one is a
left value or not.
Does an LL(1) grammar exist for this problem? Or am I absolutely wrong
with something?
bye
Chris
| |
|
| weltraum@astrocat.de (Chris) schreibt:
>I want to write a recursive descent parser, and wonder if there is a
>LL(1) grammar for "c-style" variable assignement
>(i.e. "id=id=id+id" but not "id=id=id+id=2").
IMO the problem is the required lookahead in the very last production,
that can be a rvalue in the last place, or must be an lvalue if
followed by another "=".
In a typical solution the difference between rvalue and lvalue will be
handled on semantical level, not on syntactical level in the
grammar. Consider how fruitless a syntactical distinction between
rvalue and lvalue expressions is in practice, when a parser has to try
both ways anyhow. Parsing for an rvalue should always succeed, but
must result in a mismatch when a "=" follows. Likewise a parse for an
lvalue, in the first guess, will fail for most right hand sides of
such an assignment expression. Parsing for both rvalues and lvalues at
the same time is impossible in LL(1).
An LL(1) grammar for this case IMO is possible only with dramatic
restrictions, as that the lvalue grammar/language must be distinct
from the rvalue language, distinguishable by the very first token of
every expression.
DoDi
| |
| SM Ryan 2004-11-07, 3:57 pm |
| weltraum@astrocat.de (Chris) wrote:
# hi,
#
# I want to write a recursive descent parser, and wonder if there is a
# LL(1) grammar for "c-style" variable assignement
# (i.e. "id=id=id+id" but not "id=id=id+id=2").
#
# The following grammar is LL(2) I think:
#
# A := ID = A
# A := E
#
# E := T E'
# E':= + T E'
# E':=
#
# T := ID
# T := ( A )
<A> ::= ID = <a-expression>
FIRST: {ID}
<a-expression> ::= ID <a-suffix>
FIRST: {ID}
| (<expression> ) <suffix>
FIRST: {"("}
<a-suffix> :: = <a-expression>
FIRST: {"="}
| <suffix>
FIRST: {"+"}
<expression> ::= ID <a-suffix>
FIRST: {ID}
| (<expression> ) <suffix>
FIRST: {"("}
<suffix> :: = + <expression>
FIRST: {"+"}
| empty
FIRST: FOLLOWER(A)
The parse tree is going to be a bear to untangle, though.
The only problem is that in C, the thingie before the '=', the l-value, has unbounded
size and can be nested deeply in an embedding-(-) production; potentially you won't
know it's not a l-value until deep inside the parse and shifting any number of tokens.
That means it won't be LL(k) for any k. But if the l-value can be described as something
akin to a regular expression, then it is still LL(k).
--
SM Ryan http://www.rawbw.com/~wyrmwif/
Title does not dictate behaviour.
| |
| Chris F Clark 2004-11-15, 3:57 am |
| Chris asks:
> I want to write a recursive descent parser, and wonder if there is a
> LL(1) grammar for "c-style" variable assignement
> (i.e. "id=id=id+id" but not "id=id=id+id=2").
....
> But then I lose the context. I don't no if the "id" in rule one is a
> left value or not. Does an LL(1) grammar exist for this problem? Or
> am I absolutely wrong with something?
A little analysis will answer your question.
1) The RHS of an assignment can be a single ID, via A->E->T->ID
2) It can also be an assignment, A->ID = A
Therefore, to distinguish those two cases, you need at least two
tokens ID = (vs ID + or ID EOF). So, when you factor the ID out of
the common prefix for those possibilities, you lose the distinction of
how ID is used. You need the two tokens to see ID in context.
Thus, your second grammar captures the fact that ID cannot be used in
the non-desired way. That is, your second grammar does not allow
ID+ID=ID. However, it does not build the types of AST trees that
allow you to conveniently process the input.
The problem gets worse if you try to allow an even richer C subset,
e.g. *ID = 2 or ID[I+1] = ID[I]. In general, one needs more unbounded
look-ahead (i.e. to the end of the expression) to distinguish all the
cases. This means one needs an LR rather than LL grammar.
This is an example of a typical class of problems (typed expressions)
where grammars are not a good solution. I know of two other problems
that are similar in nature. The first is distinguishing boolean
vs. integer expressions when booleans can "convert" to integers.
The second is dealing with NULL rows as special in SQL expressions.
In both cases, there are grammatic solutions to the problem. That is,
there are grammars that allow only "legal" programs to be accepted and
enforce the desired "type" rules. However, in both cases, the grammar
solution is horrendous (in that large portions of the grammar are
duplicated with slight permutations, resulting in a maintenance
nightmare). It might be possible to do a clean grammatic solution to
"type" problems with 2-level aka VW grammars. However, your typical
CFG (LL/LR grammar) is not likely to cut it.
A better solution to these kind of problems is a 2nd pass (or phase).
Build the parse tree with a "natural" grammar that doesn't try to
enforce the "type" rules and allows syntactically correct but
semantically wrong inputs. Then, apply the semantics to the
tree. Look for left-subtrees of assignments that are disallowed
expressions. Propragate the boolean/integer null/non-null-ness up the
tree to places where it needs checking. Attribute grammars do this
relatively well, and one can hand-implement most attribute traversals
by hand quite easily if one doesn't have an attribute evaluator
generator.
Hope this helps,
-Chris
****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
|
|
|
|
|