Code Comments
Programming Forum and web based access to our favorite programming groups.Background Consider any linear grammar G of a regular language L in which: a) the designated "empty" or "null string" symbol e is not an element of G's terminal alphabet Vt; b) G contains the rewrite rule Z -> z, which always terminates a derivation in G since z is an element of Vt; Now consider a grammar G' of a regular language L which is just like G except that: a) the terminal alphabet Vt' of G' does contain the null/empty string symbol e b) G' contains the rewrite rule Z -> z e instead of Z -> z. In any derivation D in G, the deepest subtree in D will be a tree consisting of one parent with one child: [ z ] Z In the corresponding derivation D' in G', the deepest subtree will be a tree consisting of one parent and two children: [ z e ] Z Question Can anyone think of any non-trivial reason why the grammars G and G' should be distinguished one from the other within formal language theory or automata theory or any application thereof ? The answer to this question has both formal and empirical consequences, otherwise I would not be wasting anyone's time to double-check my own opinion on the matter. Thanks for whatever time you can afford to spend considering this matter.
Post Follow-up to this messagedhalitsky@cumulativeinquiry.com (David Halitsky) writes: > Background > > Consider any linear grammar G of a regular language L in which: > > a) the designated "empty" or "null string" symbol e is not an > element of G's terminal alphabet Vt; > > b) G contains the rewrite rule Z -> z, which always terminates a > derivation in G since z is an element of Vt; > > Now consider a grammar G' of a regular language L which is just > like G except that: > > a) the terminal alphabet Vt' of G' does contain the null/empty string > symbol e > > b) G' contains the rewrite rule Z -> z e instead of Z -> z. > ... snip... > Question > > Can anyone think of any non-trivial reason why the grammars G and G' > should be distinguished one from the other within formal language > theory or automata theory or any application thereof ? > David, I'd argue with your use of e. In automata theory, e is not usually interpreted as a symbol in the alphabet Vt. Rather, it's the identity element you add to Vt+ (the free semigroup over Vt) to get Vt* (free monoid). So by definition, (x e) = (x). e may be a string in a language L which is a subset of Vt* but is not an element of Vt. If you were to try to think of e as an element of Vt' = Vt + e which behaved as though it were an identity for Vt'*, there would be an obvious morphism from strings in Vt'* to Vt* which just erases the e's, so you would see the same language over Vt*. The only way to extend this morphism to actions is to make the grammar action on e be the identity action as well (i.e. no action associated with the e in any production ). The problem that comes in is that you introduce nondeterminism if you try to associate any sort of action with e. Then, since ee = e = e* = e^k, you have ambiguity. If you tried to stretch the notation even further and had a production Y ::= e y generating another language, you could probably concoct some trouble composing the languages L1L2. Hope this helps... I'm more familiar with automata than grammar, though. - Kenn
Post Follow-up to this messageThe empty string is not a terminal symbol in a grammar. It is a representation of the lack of a symbol. So G == G'. Rewrite the productions in yacc format to see the equivalence.
Post Follow-up to this messageI am grateful to KH for his lucid and simple explanation concerning the difference between considering 'e' as an element of Vt and considering 'e' as present in strings which are otherwise exclusively over Vt. If Robert Low is corrrect in his observation that a right linear grammar is still linear even if its last possible production intoduces two terminals while all possible prior productions introduce just one (see thread at sci.math, sci.logic, or comp.theory entitled "Z -> z vs Z -> z e"), then it is not necessary for me to use a production of the form "Z -> z e" in order to construct the argument I'm leading up to (in that thread.) But even though the wrong-headed "Z -> z e" is no longer needed, I am nonetheless grateful to KH for pointing out its wrong-headedness in a way which has not been done by responders at sci.math, sci.logic, nor comp.theory. PS - much of Chomsky's later applied work relies critically on several different types of empty symbols which would have to be regarded as elements of Vt if they were to be formalized. But since C and some of his key students have moved entirely away from formalisms as being premature for applied linguistics, these "applied" empty symbols are not sufficiently well-defined to contradict your point concerning 'e' in the context of a formal theory of language/ automata. Thanks very much again.
Post Follow-up to this messageOn 08/10/2004 04:35 PM, Kenn Heinrich wrote: [snip] > I'd argue with your use of e. In automata theory, e is not usually > interpreted as a symbol in the alphabet Vt. Rather, it's the identity > element you add to Vt+ (the free semigroup over Vt) to get Vt* (free > monoid). So by definition, (x e) = (x). e may be a string in a Kenn, Is there something like an identity element for the alternative operator, |. I'm thinking that in (x e) the space is the implicit multiplication operator with identity e, but I was wondering if there were something similar for the addition (i.e. | ) operator. In addition, could this be what parsers are "in theory" using when the lexer's return # (as in yacc) to signal the end of input? -Larry
Post Follow-up to this messageLarry Evans <cppljevans@cox-internet.com> writes:
> Is there something like an identity element for the alternative
> operator, |.
The alternative operator is just set union, where the sets are
languages. So the identity is the empty set, the language {}.
There is no string (including e) that matches {}.
> In addition, could this be what parsers are "in theory"
> using when the lexer's return # (as in yacc) to signal the end of input?
>
Dunno. I don't usually think of e or {} as entailing any sort of
action, but maybe you can :-)
Regards,
- Kenn
Post Follow-up to this messageLarry Evans wrote: > Is there something like an identity element for the alternative > operator, |. I'm thinking that in (x e) the space is the implicit > multiplication operator with identity e, but I was wondering > if there were something similar for the addition (i.e. | ) > operator. In addition, could this be what parsers are "in theory" > using when the lexer's return # (as in yacc) to signal the end of input? It is interesting (to me, at least) that Larry Evans (LE) should bring up the matter of # as an end-of-input symbol. When I first became interested in the type of right (or left) linear derivation in which all productions save the last introduce one terminal and the last introduces two terminals, I immediately thought of interpreting the rightmost(leftmost) deepest terminal as a "STOP" symbol. This is because any such right(left) linear derivation has an automata-theoretic interpretation as a model of a finite-state process, and it seemed natural to me to interpret the "extra" terminal introduced by the last production in the derivation as a designated "STOP" symbol, or "final state" symbol, just as the symbol to the left of the rewrite arrow in the first production can be considered a designated initial symbol of the grammar (or "initial-state" symbol.) However, since I don't have even the slightest REAL knowledge of finite-state automata theory nor its relation to formal language theory, I hestitated to bring this up as a possible interpretation of the deepest rightmost(leftmost) terminal in the class of derivations in which I am interested. If the language-theoretic/automata-theoretic community were to agree that this is as legitimate and useful interpretation of the "extra" rightmost (leftmost) and deepest terminal in the class of derivations under consideration, then researchers now working under the auspices of Cumulative Inquriy can say something quite interesting about how to "double" such derivations in an intuitvely natural way which can be expressed both algebraically and geometrically. In this regard, see the "oDAG/POSET Problem" material and the "oDAGs invariant under a half-turn" material at: http://www.CumulativeInquiry.com/Problems The upshot of the research in these two threads at the above URL is that we can create the derivation tree of the derivation: A -> a B B -> b C C -> c D D -> d f by starting with the derivation tree of the derivation: C -> c D D -> d f and algebraically and/or geometrically "doubling" this derivation tree in a straightforward manner by a symmetry operation suggested by the work of Robert Jamison (Clemson) and William F. Mann (CI) on the notion of "trees invariant under a half-turn". This "doubling" of derivation trees, moreover, may shed some new formal light on the applied problem of figuring out why certain protein structures which do NOT appear to involve gene segment duplication nonetheless converge on structures which DO appear to involve gene segment duplication.
Post Follow-up to this messageI am not sure what you are trying to show, but you may be interested in the following argument that was made some years ago. Clearly the following productions are all equivalent. A --> a B C A --> a B epsilon C A --> a B epsilon C epsilon A --> a epsilon B epsilon C epsilon A --> epsilon a epsilon B epsilon C epsilon and so a context free grammar can be expressed as A --> epsilon a B C B --> epsilon b B --> epsilon e C --> epsilon c D D --> epsilon d without changing its meaning from A --> a B C B --> b B --> e C --> c D D --> d Now this LL(1) grammar can be thought of as an LL(0) grammar with a single conflict between productions 2 and 3, that conflict being on the empty string. The resolution is to look ahead one symbol, which is the standard method of LL(1) construction. So the strong LL(k) grammars can be fully described in terms of conflict resolution, and the method is orthogonal down to the case where k=0. http://home.earthlink.net/~slkpg/
Post Follow-up to this messageThanks very much for taking the time to resuscitate the example and the argument in your post to this thread ! Does your use of the example mean that you think it is OK (in language theoretic examples anyway) to think of epsilon (the emptry string symbol) as being an element of Vt which can appear to the right of the arrow in productions ? Reason I'm asking is cause Kenn (in a prior post to this thread) says that it's not really correct to look at epsilon in this way. Any additional light you could shed would be very much appreciated. Best regards Dave
Post Follow-up to this message> Does your use of the example mean that you think it is OK (in > language theoretic examples anyway) to think of epsilon (the emptry > string symbol) as being an element of Vt which can appear to the > right of the arrow in productions ? Not in Vt. Nonterminals and actions can also appear on the rhs of a production. My point was that it is not incorrect to explicitly say that there is nothing between the symbols by putting the empty string there. Redundant and hard to read, but not incorrect. A better phrasing might be that the LL(0) conflict is on the imaginary, or implied empty string at the beginning of every production. Since k=0, the conflict is on nothing, but I needed to formalize this in a way that was consistent with the cases where k>0. SLK Parser Generator: http://home.earthlink.net/~slkpg/
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.