Code Comments
Programming Forum and web based access to our favorite programming groups.hi, I want to write a recursive descent parser, and wonder if there is a LL(1) grammar for "c-style" variable assignement (i.e. "id=id=id+id" but not "id=id=id+id=2"). The following grammar is LL(2) I think: A := ID = A A := E E := T E' E':= + T E' E':= T := ID T := ( A ) I tried to convert it to LL(1), then I got something like that: A := ID A' A := ( A ) A':= = A A':= + T E' E := T E' E':= + T E' E':= T := ID T := ( A ) But then I lose the context. I don't no if the "id" in rule one is a left value or not. Does an LL(1) grammar exist for this problem? Or am I absolutely wrong with something? bye Chris
Post Follow-up to this messageweltraum@astrocat.de (Chris) schreibt: >I want to write a recursive descent parser, and wonder if there is a >LL(1) grammar for "c-style" variable assignement >(i.e. "id=id=id+id" but not "id=id=id+id=2"). IMO the problem is the required lookahead in the very last production, that can be a rvalue in the last place, or must be an lvalue if followed by another "=". In a typical solution the difference between rvalue and lvalue will be handled on semantical level, not on syntactical level in the grammar. Consider how fruitless a syntactical distinction between rvalue and lvalue expressions is in practice, when a parser has to try both ways anyhow. Parsing for an rvalue should always succeed, but must result in a mismatch when a "=" follows. Likewise a parse for an lvalue, in the first guess, will fail for most right hand sides of such an assignment expression. Parsing for both rvalues and lvalues at the same time is impossible in LL(1). An LL(1) grammar for this case IMO is possible only with dramatic restrictions, as that the lvalue grammar/language must be distinct from the rvalue language, distinguishable by the very first token of every expression. DoDi
Post Follow-up to this messageweltraum@astrocat.de (Chris) wrote:
# hi,
#
# I want to write a recursive descent parser, and wonder if there is a
# LL(1) grammar for "c-style" variable assignement
# (i.e. "id=id=id+id" but not "id=id=id+id=2").
#
# The following grammar is LL(2) I think:
#
# A := ID = A
# A := E
#
# E := T E'
# E':= + T E'
# E':=
#
# T := ID
# T := ( A )
<A> ::= ID = <a-expression>
FIRST: {ID}
<a-expression> ::= ID <a-suffix>
FIRST: {ID}
| (<expression> ) <suffix>
FIRST: {"("}
<a-suffix> :: = <a-expression>
FIRST: {"="}
| <suffix>
FIRST: {"+"}
<expression> ::= ID <a-suffix>
FIRST: {ID}
| (<expression> ) <suffix>
FIRST: {"("}
<suffix> :: = + <expression>
FIRST: {"+"}
| empty
FIRST: FOLLOWER(A)
The parse tree is going to be a bear to untangle, though.
The only problem is that in C, the thingie before the '=', the l-value, has
unbounded
size and can be nested deeply in an embedding-(-) production; potentially yo
u won't
know it's not a l-value until deep inside the parse and shifting any number
of tokens.
That means it won't be LL(k) for any k. But if the l-value can be described
as something
akin to a regular expression, then it is still LL(k).
--
SM Ryan http://www.rawbw.com/~wyrmwif/
Title does not dictate behaviour.
Post Follow-up to this messageChris asks: > I want to write a recursive descent parser, and wonder if there is a > LL(1) grammar for "c-style" variable assignement > (i.e. "id=id=id+id" but not "id=id=id+id=2"). ... > But then I lose the context. I don't no if the "id" in rule one is a > left value or not. Does an LL(1) grammar exist for this problem? Or > am I absolutely wrong with something? A little analysis will answer your question. 1) The RHS of an assignment can be a single ID, via A->E->T->ID 2) It can also be an assignment, A->ID = A Therefore, to distinguish those two cases, you need at least two tokens ID = (vs ID + or ID EOF). So, when you factor the ID out of the common prefix for those possibilities, you lose the distinction of how ID is used. You need the two tokens to see ID in context. Thus, your second grammar captures the fact that ID cannot be used in the non-desired way. That is, your second grammar does not allow ID+ID=ID. However, it does not build the types of AST trees that allow you to conveniently process the input. The problem gets worse if you try to allow an even richer C subset, e.g. *ID = 2 or ID[I+1] = ID[I]. In general, one needs more unbounded look-ahead (i.e. to the end of the expression) to distinguish all the cases. This means one needs an LR rather than LL grammar. This is an example of a typical class of problems (typed expressions) where grammars are not a good solution. I know of two other problems that are similar in nature. The first is distinguishing boolean vs. integer expressions when booleans can "convert" to integers. The second is dealing with NULL rows as special in SQL expressions. In both cases, there are grammatic solutions to the problem. That is, there are grammars that allow only "legal" programs to be accepted and enforce the desired "type" rules. However, in both cases, the grammar solution is horrendous (in that large portions of the grammar are duplicated with slight permutations, resulting in a maintenance nightmare). It might be possible to do a clean grammatic solution to "type" problems with 2-level aka VW grammars. However, your typical CFG (LL/LR grammar) is not likely to cut it. A better solution to these kind of problems is a 2nd pass (or phase). Build the parse tree with a "natural" grammar that doesn't try to enforce the "type" rules and allows syntactically correct but semantically wrong inputs. Then, apply the semantics to the tree. Look for left-subtrees of assignments that are disallowed expressions. Propragate the boolean/integer null/non-null-ness up the tree to places where it needs checking. Attribute grammars do this relatively well, and one can hand-implement most attribute traversals by hand quite easily if one doesn't have an attribute evaluator generator. Hope this helps, -Chris **************************************** ************************************ * Chris Clark Internet : compres@world.std.com Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours) ---------------------------------------------------------------------------- --
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.