Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I had a question - I know that every LL1 grammar is by definition an LR1 grammar. What I want to know is that : is every LL1 grammar an LALR1 grammar? In either case (No/Yes), I would be thankful if I could get a counterexample/proof for the claim or any pointers for the same. Thanks, Neelesh
Post Follow-up to this messageNeelesh Bodas wrote: > is every LL1 grammar an LALR1 grammar? > In either case (No/Yes), I would be thankful if I could get a > counterexample/proof for the claim or any pointers for the same. A counterexample found in Sippu and Soisalon-Soininen's monograph _Parsing_Theory_: S -> aA | bB A -> Cc | Dd B -> Cd | Dc C -> FE D -> FH E -> empty F -> empty H -> empty This grammar is SLL(1) and thus LL(1), but is not LALR(k) for any k: in the state reached when reading "aF" or "bF", one can reduce the empty string to E or H. We have lost track of the first token read---"a" or "b"---and the allowed lookaheads are "c" and "d" for both reductions. You will find in the same monograph that every reduced LL(1) grammar in which no nonterminal derives only the empty string is an LALR(1) grammar. And for k >= 2, there are reduced \epsilon-free LL(k) grammars which are not LALR(k), like the following one: S -> aA | bB A -> Cc | Dd B -> Cd | Dc C -> E D -> H E -> g H -> g This grammar is SLL(2) but not LALR(k) for any k. -- Hope that helps, Sylvain [Are LL1 languages, as opposed to grammars, LALR languages? -John]
Post Follow-up to this message> [Are LL1 languages, as opposed to grammars, LALR languages? -John]
Yes, they are:
Any LL(k) language has an LL(k) grammar, which is also LR(k). And
one can transform this LR(k) grammar into an equivalent SLR(1) grammar.
So LL languages are also LALR.
This inclusion is proper; correct me if I'm wrong, but I think the
following example shows it:
The language {c^n (a^n|b^n), n >= 0} has an LR(0) grammar
S -> A | B
A -> cAa | ca
B -> cBb | cb,
but no LL(k) grammar. One would need to see the first "a" or "b" in
order to decide between the recognition of "c^n a^n" or the
recognition of "c^n b^n", but these symbols can be arbitrarily far
away from the beginning of the input. And we cannot factor out the
"c^n" because we need to count the number of "a" or "b" after to check
it is equal to n.
--
Sylvain
Post Follow-up to this message> [Are LL1 languages, as opposed to grammars, LALR languages? -John] 1: Any LL(K) language is LR(K). 2: Any LR(K) language is LR(1). 3: Any LR(k) language is SLR(k). 4: Any SLR(K) grammar is LALR(K). Thus any LL(k) language is LALR(1). You may find proofs of 2 and 3 in: "Parsing Theory" by S.Sippo & E. Soisalon-soininen Vol 2, Springer-Verlag Karsten Nyblad 148f3wg02 at sneakemail dot com
Post Follow-up to this messageKarsten Nyblad <148f3wg02@sneakemail.com> wrote: > > 1: Any LL(K) language is LR(K). Sylvain Schmitz <schmitz@i3s.unice.fr> wrote: > Yes, they are: > Any LL(k) language has an LL(k) grammar, which is also LR(k). And > one can transform this LR(k) grammar into an equivalent SLR(1) grammar. > So LL languages are also LALR. Do you have any reference? -- Akim Demaille sent me an example where LL(1) isn't LR(1). :-) I reposted it here, but I have forgotten when. This seems to ne of the most often quoted mistakes. -- Hans Aberg
Post Follow-up to this messageHans Aberg wrote: > Karsten Nyblad <148f3wg02@sneakemail.com> wrote: > > > > > Sylvain Schmitz <schmitz@i3s.unice.fr> wrote: > > > > > Do you have any reference? -- Akim Demaille sent me an example where LL(1) > isn't LR(1). :-) I reposted it here, but I have forgotten when. This seems > to ne of the most often quoted mistakes. > > -- > Hans Aberg I could not find an reference right away. So: Let s1, s2, s3, s4, s5, and s6 be strings of terminals. Let V, W, and Z be strings of terminals and non terminals. Let A and B be non terminals. Two strings are concatenated by placing besides each other, i.e., s1 s2 is the concatenation of s1 and s2 and V s1 is the concatenation of V and s1. Assume that G is a grammar that is LL(K) but not LR(K) because of a reduce/reduce conflict between two rules A -> W Z and B -> Z. Let s1 s2 s3 be the shortest string such that there is a string of terminals and nonterminals V and three strings of terminals s4, s5, and s6 where: 1: s1, s2, and s3 can be derived from V, W, and Z, respectively, and 2: V W Z s4 s5 and V W Z s4 s6 and V A s4 s5 and V W B s4 s6 can be derived right most from G's start symbol, and 3: s4 is of length K. Then the top most state on the parser stack after parsing s1 s2 s3 of s1 s2 s3 s4 s5 will be the state with the conflict, and the symbols pushed on the stack will be V W Z. Let AL be an LL(K) automaton. This automaton has to decide whether or not A should be pushed on the stack no later than when s1 has been recognized, but in both case the automaton may continue recognizing s2 s3 s4. that that string is longer than K. Thus G cannot be LL(K). You can make a similar argument if the conflict is a stack/reduce conflict. How did I use that the conflict was in the LR(K) automaton and not in the LALR(K) automaton? In the example above I used that V W Z s4 s5 and V W Z s4 s6 that can be derived from the start symbol. Thus we know that V W Z s4 represents a prefix of some strings in L(G). An LR(K) parser will stop and report an error the moment the symbols pushed on the stack + the K symbols in the windows cannot be derived to a prefix of the language. That is not always the case when the parser is an SLR(K) or LALR(K) parser. There the symbols pushed on the stack can always be derived to a prefix of the language, but SLR(K) and LALR(K) automatons may perform reduce and stack operations with erroneous symbols in the window. (Stack operations only if K > 1.) Thus if the automaton in the argument above had been an LALR(K) or SLR(K) automaton, then it may not be possible to find a set of values for V and s4 such that V W Z on the stack will be reduced to V A when s4 is in the window even if there is no s5 such that V A s4 s5 can be derived from the start symbol. Thus the argument above depends on that LR(K) parsers do not perform any stack or reduce operations if the string of grammar symbols pushed on the stack, if that string concatenated with the string in the window is not a prefix of a string derived from the start symbol. Karsten Nyblad 148f3wg02 at sneakemail dot com
Post Follow-up to this messageHans Aberg wrote: > Do you have any reference? -- Akim Demaille sent me an example where LL(1) > isn't LR(1). :-) I reposted it here, but I have forgotten when. I could not find your post in the archives. Anyway this result is a very sure one; you can find a demonstration for it in _Parsing_Theory_ by Sippu and Soisalon-Soininen, vol. 2, pp. 239--248. To give a more intuitive answer, an LL(k) parser is like an LR(k) parser with a semantic action embedded at the very beginning of every single rule: it has to decide immediately what to do, only looking at the "k" first tokens from this point of the rule. From there, proper inclusion of the set of all LL(k) grammars in the set of LR(k) grammars is obvious. -- Regards, Sylvain
Post Follow-up to this messageAt 13:40 +0200 2005/05/02, Sylvain Schmitz wrote: >Hans Aberg wrote: > >I could not find your post in the archives. Anyway this result is a >very sure one; you can find a demonstration for it in >_Parsing_Theory_ by Sippu and Soisalon-Soininen, vol. 2, pp. >239--248. > >To give a more intuitive answer, an LL(k) parser is like an LR(k) >parser with a semantic action embedded at the very beginning of >every single rule: it has to decide immediately what to do, only >looking at the "k" first tokens from this point of the rule. From >there, proper inclusion of the set of all LL(k) grammars in the set >of LR(k) grammars is obvious. Let repost it here, then. It is from the Help-Bison mailing list <http://mail.gnu.org/mailman/listinfo/help-bison>, 17 Jan 2002, from Akim Demaille: Hans> It is probably LL(1) then (modulo tweaks), which => LALR(1), This is not absolutely true, although it is in practice. IIRC the result holds when there are no empty rules. See for instance http://compilers.iecc.com/comparch/article/93-09-025 or the errata of Andrew Appel about this book on compiler implementation: http://www.cs.princeton.edu/~appel/.../ml/errata.html Page 64. Figure 3.26 incorrectly shows LL(1) as a subset of SLR. In fact, LL(1) is not even a subset of LALR(1): there is an LL(1) grammar that is not LALR(1). Shouldn't this be in the comp.compilers FAQ? -- Hans Aberg [If it's only come up twice since 1993, that doesn't seem so frequent. -John ]
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.