For Programmers: Free Programming Magazines  


Home > Archive > Unix Programming > September 2004 > Yacc & lex problem









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Yacc & lex problem
Farid Benzakour

2004-09-27, 4:01 pm

I'd like the parser to recognize the following line :
XXX.CC = YYY.CC
How should implement the '.' ?

I tried the following :

Line:
LeftObject '.' Position EQUAL RightObject '.' Position

But the '.' is not recongnized. LeftObject is equal to "XXX.CC"
I don't want to put a token on the '.' (dot) as strings may contain dots.
How should I proceed to recognize XXX and YYY ?

THX


Jens.Toerring@physik.fu-berlin.de

2004-09-27, 4:01 pm

Farid Benzakour <farid.benzakour@karelis-systemes.com> wrote:
> I'd like the parser to recognize the following line :
> XXX.CC = YYY.CC
> How should implement the '.' ?


> I tried the following :


> Line:
> LeftObject '.' Position EQUAL RightObject '.' Position


> How should I proceed to recognize XXX and YYY ?


It's unclear what you mean. If e.g. "XXX.CC" represents a single
entity you need the lexer to recognize it and pass a single token
for it to the parser, with the "XXX" and the "CC" bits in the union
for the semantic values of the token. You could have something like
this in the parser:

%union {
struct {
const char *pre_dot;
const char *after_dot;
} r_and_l;
char *str_ptr;
}

%token <r_and_l> LeftObject RightObject
%token <str_ptr> StringObject

%%

Line:
LeftObject EQUAL RightObject { do_something( $1, $3 ); }
;
....

And in the lexer you could use e.g. the following to detect the
tokens and split them up to be able to pass them to the parser
if necessary:

"XXX\.CC" {
yylval.r_and_l.pre_dot = "XXX";
yylval.r_and_l.after_dot = "CC";
return LeftObject;
}

"YYY\.CC" {
yylval.r_and_l.pre_dot = "YYY";
yylval.r_and_l.after_dot = "CC";
return RightObject;
}

'=' return EQUAL;
[\t ]+ /* drop whitespace */

> But the '.' is not recongnized. LeftObject is equal to "XXX.CC"
> I don't want to put a token on the '.' (dot) as strings may contain dots.


There won't be a dot when you pass a string to the parser - the
parser gets a single integer (token) it interprets to stand for a
string while the string (with the embedded dots) gets stored in
the union for the semantic values (as I tried to indicate with
the 'str_ptr' member above and the additional token value
'StringObject').

If, on the other hand, "XXX", "YYY" and "CC" are entities completely
on their own but you don't want to allow whitespace between e.g. the
"XXX", the dot and the "CC" you need to pass all whitespace to the
parser (as well as the dot) and allow for witespace in the syntax
wherever it is allowed. Then you would need something like this in
the parser:

Line:
WS LeftObject '.' Position WS EQUAL WS RightObject '.' Position WS
;

where WS is just used to skip over optional whitespace. In the lexer
you now would have e.g.

"XXX" return LeftPosition;
"YYY" return RightPosition;
"CC" return Position;
'.' return '.';
'=' return EQUAL;
[\t ]+ return WS;

Regards, Jens
--
\ Jens Thoms Toerring ___ Jens.Toerring@physik.fu-berlin.de
\__________________________ http://www.toerring.de
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com