For Programmers: Free Programming Magazines  


Home > Archive > Compilers > May 2005 > LALR1 and LL1









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author LALR1 and LL1
Neelesh Bodas

2005-04-11, 3:58 am

Hi,
I had a question -
I know that every LL1 grammar is by definition an LR1 grammar. What I
want to know is that :
is every LL1 grammar an LALR1 grammar?
In either case (No/Yes), I would be thankful if I could get a
counterexample/proof for the claim or any pointers for the same.

Thanks,
Neelesh
Sylvain Schmitz

2005-04-16, 3:58 pm

Neelesh Bodas wrote:
> is every LL1 grammar an LALR1 grammar?
> In either case (No/Yes), I would be thankful if I could get a
> counterexample/proof for the claim or any pointers for the same.


A counterexample found in Sippu and Soisalon-Soininen's monograph
_Parsing_Theory_:

S -> aA | bB
A -> Cc | Dd
B -> Cd | Dc
C -> FE
D -> FH
E -> empty
F -> empty
H -> empty

This grammar is SLL(1) and thus LL(1), but is not LALR(k) for any k: in
the state reached when reading "aF" or "bF", one can reduce the empty
string to E or H. We have lost track of the first token read---"a" or
"b"---and the allowed lookaheads are "c" and "d" for both reductions.

You will find in the same monograph that every reduced LL(1) grammar in
which no nonterminal derives only the empty string is an LALR(1)
grammar. And for k >= 2, there are reduced \epsilon-free LL(k) grammars
which are not LALR(k), like the following one:

S -> aA | bB
A -> Cc | Dd
B -> Cd | Dc
C -> E
D -> H
E -> g
H -> g

This grammar is SLL(2) but not LALR(k) for any k.

--
Hope that helps,

Sylvain
[Are LL1 languages, as opposed to grammars, LALR languages? -John]
Sylvain Schmitz

2005-04-27, 3:59 am

> [Are LL1 languages, as opposed to grammars, LALR languages? -John]
Yes, they are:
Any LL(k) language has an LL(k) grammar, which is also LR(k). And
one can transform this LR(k) grammar into an equivalent SLR(1) grammar.
So LL languages are also LALR.

This inclusion is proper; correct me if I'm wrong, but I think the
following example shows it:
The language {c^n (a^n|b^n), n >= 0} has an LR(0) grammar
S -> A | B
A -> cAa | ca
B -> cBb | cb,
but no LL(k) grammar. One would need to see the first "a" or "b" in
order to decide between the recognition of "c^n a^n" or the
recognition of "c^n b^n", but these symbols can be arbitrarily far
away from the beginning of the input. And we cannot factor out the
"c^n" because we need to count the number of "a" or "b" after to check
it is equal to n.

--
Sylvain
Karsten Nyblad

2005-04-27, 3:59 am

> [Are LL1 languages, as opposed to grammars, LALR languages? -John]

1: Any LL(K) language is LR(K).
2: Any LR(K) language is LR(1).
3: Any LR(k) language is SLR(k).
4: Any SLR(K) grammar is LALR(K).

Thus any LL(k) language is LALR(1).

You may find proofs of 2 and 3 in: "Parsing Theory" by S.Sippo & E.
Soisalon-soininen Vol 2, Springer-Verlag

Karsten Nyblad
148f3wg02 at sneakemail dot com
Hans Aberg

2005-05-01, 9:12 pm

Karsten Nyblad <148f3wg02@sneakemail.com> wrote:

>
> 1: Any LL(K) language is LR(K).


Sylvain Schmitz <schmitz@i3s.unice.fr> wrote:

> Yes, they are:
> Any LL(k) language has an LL(k) grammar, which is also LR(k). And
> one can transform this LR(k) grammar into an equivalent SLR(1) grammar.
> So LL languages are also LALR.


Do you have any reference? -- Akim Demaille sent me an example where LL(1)
isn't LR(1). :-) I reposted it here, but I have forgotten when. This seems
to ne of the most often quoted mistakes.

--
Hans Aberg
Karsten Nyblad

2005-05-01, 9:12 pm

Hans Aberg wrote:
> Karsten Nyblad <148f3wg02@sneakemail.com> wrote:
>
>
>
>
> Sylvain Schmitz <schmitz@i3s.unice.fr> wrote:
>
>
>
>
> Do you have any reference? -- Akim Demaille sent me an example where LL(1)
> isn't LR(1). :-) I reposted it here, but I have forgotten when. This seems
> to ne of the most often quoted mistakes.
>
> --
> Hans Aberg


I could not find an reference right away. So:

Let s1, s2, s3, s4, s5, and s6 be strings of terminals.
Let V, W, and Z be strings of terminals and non terminals.
Let A and B be non terminals.
Two strings are concatenated by placing besides each other, i.e., s1 s2
is the concatenation of s1 and s2 and V s1 is the concatenation of V and s1.

Assume that G is a grammar that is LL(K) but not LR(K) because of a
reduce/reduce conflict between two rules A -> W Z and B -> Z. Let s1 s2
s3 be the shortest string such that there is a string of terminals and
nonterminals V and three strings of terminals s4, s5, and s6 where:
1: s1, s2, and s3 can be derived from V, W, and Z, respectively, and
2: V W Z s4 s5 and V W Z s4 s6 and V A s4 s5 and V W B s4 s6 can be
derived right most from G's start symbol, and
3: s4 is of length K.
Then the top most state on the parser stack after parsing s1 s2 s3 of s1
s2 s3 s4 s5 will be the state with the conflict, and the symbols pushed
on the stack will be V W Z.

Let AL be an LL(K) automaton. This automaton has to decide whether or
not A should be pushed on the stack no later than when s1 has been
recognized, but in both case the automaton may continue recognizing s2
s3 s4. that that string is longer than K. Thus G cannot be LL(K).

You can make a similar argument if the conflict is a stack/reduce conflict.

How did I use that the conflict was in the LR(K) automaton and not
in the LALR(K) automaton? In the example above I used that V W Z s4 s5
and V W Z s4 s6 that can be derived from the start symbol. Thus we know
that V W Z s4 represents a prefix of some strings in L(G). An LR(K)
parser will stop and report an error the moment the symbols pushed on
the stack + the K symbols in the windows cannot be derived to a prefix
of the language. That is not always the case when the parser is an
SLR(K) or LALR(K) parser. There the symbols pushed on the stack can
always be derived to a prefix of the language, but SLR(K) and LALR(K)
automatons may perform reduce and stack operations with erroneous
symbols in the window. (Stack operations only if K > 1.) Thus if the
automaton in the argument above had been an LALR(K) or SLR(K) automaton,
then it may not be possible to find a set of values for V and s4 such
that V W Z on the stack will be reduced to V A when s4 is in the window
even if there is no s5 such that V A s4 s5 can be derived from the start
symbol.

Thus the argument above depends on that LR(K) parsers do not perform any
stack or reduce operations if the string of grammar symbols pushed on
the stack, if that string concatenated with the string in the window is
not a prefix of a string derived from the start symbol.

Karsten Nyblad
148f3wg02 at sneakemail dot com

Sylvain Schmitz

2005-05-03, 4:01 am

Hans Aberg wrote:
> Do you have any reference? -- Akim Demaille sent me an example where LL(1)
> isn't LR(1). :-) I reposted it here, but I have forgotten when.


I could not find your post in the archives. Anyway this result is a
very sure one; you can find a demonstration for it in _Parsing_Theory_
by Sippu and Soisalon-Soininen, vol. 2, pp. 239--248.

To give a more intuitive answer, an LL(k) parser is like an LR(k) parser
with a semantic action embedded at the very beginning of every single
rule: it has to decide immediately what to do, only looking at the "k"
first tokens from this point of the rule. From there, proper inclusion
of the set of all LL(k) grammars in the set of LR(k) grammars is obvious.

--
Regards,

Sylvain
Hans Aberg

2005-05-03, 4:01 am

At 13:40 +0200 2005/05/02, Sylvain Schmitz wrote:
>Hans Aberg wrote:
>
>I could not find your post in the archives. Anyway this result is a
>very sure one; you can find a demonstration for it in
>_Parsing_Theory_ by Sippu and Soisalon-Soininen, vol. 2, pp.
>239--248.
>
>To give a more intuitive answer, an LL(k) parser is like an LR(k)
>parser with a semantic action embedded at the very beginning of
>every single rule: it has to decide immediately what to do, only
>looking at the "k" first tokens from this point of the rule. From
>there, proper inclusion of the set of all LL(k) grammars in the set
>of LR(k) grammars is obvious.


Let repost it here, then. It is from the Help-Bison mailing list
<http://mail.gnu.org/mailman/listinfo/help-bison>, 17 Jan 2002, from
Akim Demaille:
[color=darkred]

Hans> It is probably LL(1) then (modulo tweaks), which => LALR(1),

This is not absolutely true, although it is in practice. IIRC the
result holds when there are no empty rules. See for instance

http://compilers.iecc.com/comparch/article/93-09-025

or the errata of Andrew Appel about this book on compiler
implementation:

http://www.cs.princeton.edu/~appel/.../ml/errata.html

Page 64. Figure 3.26 incorrectly shows LL(1) as a subset of
SLR. In fact, LL(1) is not even a subset of LALR(1): there is
an LL(1) grammar that is not LALR(1).

Shouldn't this be in the comp.compilers FAQ?
--
Hans Aberg
[If it's only come up twice since 1993, that doesn't seem so frequent. -John]

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com