For Programmers: Free Programming Magazines  


Home > Archive > Compilers > March 2005 > Re: Q2. Why do you split a monolithic grammar into the lexing and parsing rules?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Q2. Why do you split a monolithic grammar into the lexing and parsing rules?
valentin tihomirov

2005-03-08, 3:59 am

> I'm assuming this would be like being able to do this in C:
> int else = 4;
> printf("%i",else);


Nice example, I was thinking of pascalish
procedure do();



> A character is a different class of object than a word, and the two
> operate on different levels.


One bit, one soound or one char is not enaught to encode all the
notions in the world. Fortunately for mankind, the ancient Latins, as
opposed to Chinese, were wise enough to construct the words from
letters. You right, most of the words encode notions in natural
languages, where the letters confom to sounds in the oral speach
(English language has lost this natural relationship). Such separation
looks natural in the human world. The formal languages do not have the
2 levels restriction. They may have as much levels as necessary. When
I speak about CF lang I see a syntax tree.


> Tokens that are considered to have direct
> meaning, like ELSE, IF, WHILE, etc.


... and the wrong meaning will be assigned by lexer to ELSE in your
example above. The ELSE may have any sense depending on the context
(as well as most words in the natural languages, which are considered
PS or CS). Recognizers do not worry about meaning at all (both tokens
and letters are symbols in formal langs).



> and because of this, with a unified grammar I would want to have all
> "words" constructed out of "letters" and nothing else derived from
> "letters",


Which class do literal tokens belong to? The IF, THEN, ELSE are separator
marks.
IF expr THEN stat ELSE stat;

Some words (known as phrases) consist of 2 or more words.


> so there would be no practical difference between a separate
> lexer and parser and a unified parser...


The advantage is a known context of symbols. Despite we do not use CS
languages, the words still have a context-dependent menaning. In
addition, the lexer cannot calculate the LL(k) lookahead as it cannot
predict what comes after the token end; it assumes that any token can
follow resulting in huge ambiguities (ANTLR just supresses them not to
confuse user); meantime, the top-down parser knows start symbol and
owns information about FOLLOW(token) set.



> So I don't think the problem is artificial, or at least the problem is
> not artificial in quite that way.


Humans bring their (natural) way of thinking into the formal language
theory which already has a general device for language description.


> I the most natural solution I can think of that works purely by
> grammar is to make the lexer and parser non-deterministic, so 'else'
> above would come out of the lexer as the set { ID, ELSE
> }.


As you see, the lexer prematurely assigns a meaning to the "word"
regardless of the context. Context is a placeholder where a token
(syntactically structured sequence of symbols) can appear in the
production rule.
[The most plausible argument I've seen for syntax design without
reserved words was for PL/I, which had so many of them that it was
impractical for programmers to remember them all. Assuming you
have a more parsimonious language than PL/I or Cobol, the ability
to use IF or FOR as a variable is more likely to be confusing than
useful. -John]

Norm Dresner

2005-03-09, 4:00 am

"valentin tihomirov" <spam@abelectron.com> wrote in message
>
> Nice example, I was thinking of pascalish
> procedure do();
>
>
>
>
> One bit, one soound or one char is not enaught to encode all the
> notions in the world. Fortunately for mankind, the ancient Latins, as
> opposed to Chinese, were wise enough to construct the words from
> letters. You right, most of the words encode notions in natural
> languages, where the letters confom to sounds in the oral speach
> (English language has lost this natural relationship). Such separation
> looks natural in the human world.


There's a very interesting paper in a recent issue of the scientific
journal Nature whose conclusion is that because the Chinese language
is processed by the visual center of the brain instead of the auditory
one that Chinese students enjoy a ~5 point advantage on IQ tests!
There's no way to determine -- at least right now -- if this actually
makes them smarter, but they will appear so in some circumstances.

As a Westerner who's learned (some) Japanese (a very interesting
amalgam of both ideographic and phonetic components), I can assure you
that having the ideograph encode both the meaning and the sound isn't
really a drawback.

Norm
[Chinese has the unusual characteristic that the spoken languages are
as different as French and Portugese, but they're all written the
same. Perhaps there's an inspiration for us programmers there. -John]

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com