For Programmers: Free Programming Magazines  


Home > Archive > Compilers > September 2004 > Question re the (non-)equivalence of Z -> z and Z -> z e (e the empty string)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Question re the (non-)equivalence of Z -> z and Z -> z e (e the empty string)
David Halitsky

2004-08-09, 3:56 am

Background

Consider any linear grammar G of a regular language L in which:

a) the designated "empty" or "null string" symbol e is not an
element of G's terminal alphabet Vt;

b) G contains the rewrite rule Z -> z, which always terminates a
derivation in G since z is an element of Vt;

Now consider a grammar G' of a regular language L which is just
like G except that:

a) the terminal alphabet Vt' of G' does contain the null/empty string
symbol e

b) G' contains the rewrite rule Z -> z e instead of Z -> z.

In any derivation D in G, the deepest subtree in D will be a tree
consisting of one parent with one child:

[ z ]
Z

In the corresponding derivation D' in G', the deepest subtree will be
a tree consisting of one parent and two children:

[ z e ]
Z

Question

Can anyone think of any non-trivial reason why the grammars G and G'
should be distinguished one from the other within formal language
theory or automata theory or any application thereof ?

The answer to this question has both formal and empirical
consequences, otherwise I would not be wasting anyone's time to
double-check my own opinion on the matter.

Thanks for whatever time you can afford to spend considering this
matter.
Kenn Heinrich

2004-08-10, 9:03 pm

dhalitsky@cumulativeinquiry.com (David Halitsky) writes:

> Background
>
> Consider any linear grammar G of a regular language L in which:
>
> a) the designated "empty" or "null string" symbol e is not an
> element of G's terminal alphabet Vt;
>
> b) G contains the rewrite rule Z -> z, which always terminates a
> derivation in G since z is an element of Vt;
>
> Now consider a grammar G' of a regular language L which is just
> like G except that:
>
> a) the terminal alphabet Vt' of G' does contain the null/empty string
> symbol e
>
> b) G' contains the rewrite rule Z -> z e instead of Z -> z.
>

.... snip...

> Question
>
> Can anyone think of any non-trivial reason why the grammars G and G'
> should be distinguished one from the other within formal language
> theory or automata theory or any application thereof ?
>


David,

I'd argue with your use of e. In automata theory, e is not usually
interpreted as a symbol in the alphabet Vt. Rather, it's the identity
element you add to Vt+ (the free semigroup over Vt) to get Vt* (free
monoid). So by definition, (x e) = (x). e may be a string in a
language L which is a subset of Vt* but is not an element of Vt.

If you were to try to think of e as an element of Vt' = Vt + e which
behaved as though it were an identity for Vt'*, there would be an
obvious morphism from strings in Vt'* to Vt* which just erases the
e's, so you would see the same language over Vt*. The only way to
extend this morphism to actions is to make the grammar action on e be
the identity action as well (i.e. no action associated with the e in
any production ).

The problem that comes in is that you introduce nondeterminism if you
try to associate any sort of action with e. Then, since ee = e = e* =
e^k, you have ambiguity. If you tried to stretch the notation even
further and had a production Y ::= e y generating another language,
you could probably concoct some trouble composing the languages L1L2.

Hope this helps... I'm more familiar with automata than grammar,
though.

- Kenn
dtf

2004-08-10, 9:03 pm

The empty string is not a terminal symbol in a grammar. It is a
representation of the lack of a symbol. So G == G'. Rewrite the productions
in yacc format to see the equivalence.

David Halitsky

2004-08-11, 4:00 pm

I am grateful to KH for his lucid and simple explanation concerning
the difference between considering 'e' as an element of Vt and
considering 'e' as present in strings which are otherwise exclusively
over Vt.

If Robert Low is corrrect in his observation that a right linear
grammar is still linear even if its last possible production intoduces
two terminals while all possible prior productions introduce just one
(see thread at sci.math, sci.logic, or comp.theory entitled "Z -> z vs
Z -> z e"), then it is not necessary for me to use a production of the
form "Z -> z e" in order to construct the argument I'm leading up to
(in that thread.)

But even though the wrong-headed "Z -> z e" is no longer needed, I am
nonetheless grateful to KH for pointing out its wrong-headedness in a
way which has not been done by responders at sci.math, sci.logic, nor
comp.theory.

PS - much of Chomsky's later applied work relies critically on several
different types of empty symbols which would have to be regarded as
elements of Vt if they were to be formalized. But since C and some of
his key students have moved entirely away from formalisms as being
premature for applied linguistics, these "applied" empty symbols are
not sufficiently well-defined to contradict your point concerning 'e'
in the context of a formal theory of language/ automata.

Thanks very much again.
Larry Evans

2004-08-11, 8:58 pm

On 08/10/2004 04:35 PM, Kenn Heinrich wrote:
[snip]
> I'd argue with your use of e. In automata theory, e is not usually
> interpreted as a symbol in the alphabet Vt. Rather, it's the identity
> element you add to Vt+ (the free semigroup over Vt) to get Vt* (free
> monoid). So by definition, (x e) = (x). e may be a string in a


Kenn,

Is there something like an identity element for the alternative
operator, |. I'm thinking that in (x e) the space is the implicit
multiplication operator with identity e, but I was wondering
if there were something similar for the addition (i.e. | )
operator. In addition, could this be what parsers are "in theory"
using when the lexer's return # (as in yacc) to signal the end of input?

-Larry
Kenn Heinrich

2004-08-13, 8:57 pm

Larry Evans <cppljevans@cox-internet.com> writes:

> Is there something like an identity element for the alternative
> operator, |.


The alternative operator is just set union, where the sets are
languages. So the identity is the empty set, the language {}.
There is no string (including e) that matches {}.

> In addition, could this be what parsers are "in theory"
> using when the lexer's return # (as in yacc) to signal the end of input?
>


Dunno. I don't usually think of e or {} as entailing any sort of
action, but maybe you can :-)

Regards,

- Kenn
David Halitsky

2004-08-13, 8:57 pm

Larry Evans wrote:

> Is there something like an identity element for the alternative
> operator, |. I'm thinking that in (x e) the space is the implicit
> multiplication operator with identity e, but I was wondering
> if there were something similar for the addition (i.e. | )
> operator. In addition, could this be what parsers are "in theory"
> using when the lexer's return # (as in yacc) to signal the end of input?


It is interesting (to me, at least) that Larry Evans (LE) should bring
up the matter of # as an end-of-input symbol.

When I first became interested in the type of right (or left) linear
derivation in which all productions save the last introduce one
terminal and the last introduces two terminals, I immediately thought
of interpreting the rightmost(leftmost) deepest terminal as a "STOP"
symbol. This is because any such right(left) linear derivation has an
automata-theoretic interpretation as a model of a finite-state
process, and it seemed natural to me to interpret the "extra" terminal
introduced by the last production in the derivation as a designated
"STOP" symbol, or "final state" symbol, just as the symbol to the left
of the rewrite arrow in the first production can be considered a
designated initial symbol of the grammar (or "initial-state" symbol.)

However, since I don't have even the slightest REAL knowledge of
finite-state automata theory nor its relation to formal language
theory, I hestitated to bring this up as a possible interpretation of
the deepest rightmost(leftmost) terminal in the class of derivations
in which I am interested.

If the language-theoretic/automata-theoretic community were to agree
that this is as legitimate and useful interpretation of the "extra"
rightmost (leftmost) and deepest terminal in the class of derivations
under consideration, then researchers now working under the auspices
of Cumulative Inquriy can say something quite interesting about how to
"double" such derivations in an intuitvely natural way which can be
expressed both algebraically and geometrically. In this regard, see
the "oDAG/POSET Problem" material and the "oDAGs invariant under a
half-turn" material at:

http://www.CumulativeInquiry.com/Problems

The upshot of the research in these two threads at the above URL is
that we can create the derivation tree of the derivation:

A -> a B
B -> b C
C -> c D
D -> d f

by starting with the derivation tree of the derivation:

C -> c D
D -> d f

and algebraically and/or geometrically "doubling" this derivation tree
in a straightforward manner by a symmetry operation suggested by the
work of Robert Jamison (Clemson) and William F. Mann (CI) on the
notion of "trees invariant under a half-turn".

This "doubling" of derivation trees, moreover, may shed some new
formal light on the applied problem of figuring out why certain
protein structures which do NOT appear to involve gene segment
duplication nonetheless converge on structures which DO appear to
involve gene segment duplication.
SLK Parsers

2004-08-16, 3:57 am

I am not sure what you are trying to show, but you may be interested in the
following argument that was made some years ago.

Clearly the following productions are all equivalent.

A --> a B C
A --> a B epsilon C
A --> a B epsilon C epsilon
A --> a epsilon B epsilon C epsilon
A --> epsilon a epsilon B epsilon C epsilon

and so a context free grammar can be expressed as

A --> epsilon a B C
B --> epsilon b
B --> epsilon e
C --> epsilon c D
D --> epsilon d

without changing its meaning from

A --> a B C
B --> b
B --> e
C --> c D
D --> d

Now this LL(1) grammar can be thought of as an LL(0) grammar with a single
conflict between productions 2 and 3, that conflict being on the empty
string. The resolution is to look ahead one symbol, which is the standard
method of LL(1) construction. So the strong LL(k) grammars can be fully
described in terms of conflict resolution, and the method is orthogonal down
to the case where k=0.

http://home.earthlink.net/~slkpg/
David Halitsky

2004-08-23, 4:02 pm

Thanks very much for taking the time to resuscitate
the example and the argument in your post to this
thread !

Does your use of the example mean that you think it
is OK (in language theoretic examples anyway) to
think of epsilon (the emptry string symbol) as being
an element of Vt which can appear to the right of
the arrow in productions ?

Reason I'm asking is cause Kenn (in a prior post
to this thread) says that it's not really correct
to look at epsilon in this way.

Any additional light you could shed would be very
much appreciated.

Best regards
Dave

SLK Parsers

2004-09-03, 4:03 pm

> Does your use of the example mean that you think it is OK (in
> language theoretic examples anyway) to think of epsilon (the emptry
> string symbol) as being an element of Vt which can appear to the
> right of the arrow in productions ?


Not in Vt. Nonterminals and actions can also appear on the rhs of a
production. My point was that it is not incorrect to explicitly say that
there is nothing between the symbols by putting the empty string there.
Redundant and hard to read, but not incorrect.

A better phrasing might be that the LL(0) conflict is on the imaginary, or
implied empty string at the beginning of every production. Since k=0, the
conflict is on nothing, but I needed to formalize this in a way that was
consistent with the cases where k>0.

SLK Parser Generator: http://home.earthlink.net/~slkpg/
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com