For Programmers: Free Programming Magazines  


Home > Archive > Compilers > April 2004 > a token scanner DFA for indirection operator * ?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author a token scanner DFA for indirection operator * ?
RonG

2004-03-27, 12:29 am

Hi:

I'm trying to write a token scanner for C, and I'm wondering if there is a
detrministic finite automata (DFA) or state machine for the '*'
indirection operator(IOP), or if differentiation between the multiply
operator and the IOP is better left to the parser. Right now my scanner
keeps track of the previous token and also has a lookahead char. My
biggest problem occurs when the previous token is a right parenthesis. In
this case, the parenthesis can enclolse:
1.an expression ( in which case '*' is the multiply operator )
2.a boolean expressian after an 'if' or 'while' ( in which case '*', if
followed by an alphabetic char or underscore, is an IOP )
3.the conditions for a 'for' statement.

I imagine that I could have a flag variable telling me whether the
previous right parenthesis token closed an arithmetic expression, but I
was wondering if I was overlooking something.
Is it possible to determine '*' with just the previous token and a
lookahead char, or do I need to introduce a flag?

thanks in advance
Ron Gesell

PS. I haven't looke at it yet, but I suspect the same situation can arise
with '&'.
[I think your life will be a lot easier if you interpret the tokens in
the parser, not the lexer. -John]
Alex Colvin

2004-04-03, 9:32 am

>I'm trying to write a token scanner for C, and I'm wondering if there is a
>detrministic finite automata (DFA) or state machine for the '*'
>indirection operator(IOP), or if differentiation between the multiply
>operator and the IOP is better left to the parser. Right now my scanner


This is usually left to the parser.
you have to distinguish
(int)*ptr
(width)*height
which requires knowing that "int" is a type (or typedef) in the current
scope.


--
mac the naïf

VBDis

2004-04-03, 10:36 am

RonG <rgesell@mb.sympatico.ca> schreibt:

>I'm trying to write a token scanner for C, and I'm wondering if there is a
>detrministic finite automata (DFA) or state machine for the '*'
>indirection operator(IOP), or if differentiation between the multiply
>operator and the IOP is better left to the parser.


Doesn't '*' have 3 meanings?
1. multiply
2. dereference
3. pointer (in declaration, cast-expression)

>Is it possible to determine '*' with just the previous token and a
>lookahead char, or do I need to introduce a flag?


For (3) the scanner must be able to distinguish between type names and other
identifiers.

>PS. I haven't looke at it yet, but I suspect the same situation can arise
>with '&'.


And with '++' and '--' as prefix or postfix operators, and '+' and '-' as unary
or binary operators...

>[I think your life will be a lot easier if you interpret the tokens in
>the parser, not the lexer. -John]


Agreed.

DoDi
Dmitry A. Kazakov

2004-04-03, 10:36 am

On 26 Mar 2004 22:38:44 -0500, RonG <rgesell@mb.sympatico.ca> wrote:

>I'm trying to write a token scanner for C, and I'm wondering if there is a
>detrministic finite automata (DFA) or state machine for the '*'
>indirection operator(IOP), or if differentiation between the multiply
>operator and the IOP is better left to the parser. Right now my scanner
>keeps track of the previous token and also has a lookahead char. My
>biggest problem occurs when the previous token is a right parenthesis. In
>this case, the parenthesis can enclolse:
>1.an expression ( in which case '*' is the multiply operator )
>2.a boolean expressian after an 'if' or 'while' ( in which case '*', if
> followed by an alphabetic char or underscore, is an IOP )
>3.the conditions for a 'for' statement.
>
>I imagine that I could have a flag variable telling me whether the
>previous right parenthesis token closed an arithmetic expression, but I
>was wondering if I was overlooking something.
>Is it possible to determine '*' with just the previous token and a
>lookahead char, or do I need to introduce a flag?


Whether "*" is an unary or dyadic operator depends on the context. You
can keep track on its switching:

prefix context ->
operand (= literal/identifier) ->
postfix context ->
infix context (dyadic operator, comma) ->
prefix context ->
....

When "*" appears in the prefix context, then it is an unary prefix
operator, when it does in the infix context, then it is a dyadic
operator and so on.

>thanks in advance
>Ron Gesell
>
>PS. I haven't looke at it yet, but I suspect the same situation can arise
>with '&'.
>[I think your life will be a lot easier if you interpret the tokens in
>the parser, not the lexer. -John]


Yes, when both lexer and parser are integrated in one.

--
Regards,
Dmitry Kazakov
www.dmitry-kazakov.de
Cassé Hugues

2004-04-15, 1:35 pm

Yes, there is a way. Please look at my own C front-end in OCAML
(http://casse.hugues.free.fr/projects/frontc.html). The "*" may be used
prefixed or infixed in C expression: not a problem for LR(1) parser. In
type definition, the "context" is very different and no problem arises.
Issues come from the declaration types using "typedef" names. In case
of blocks "{" ... "}", knowledge of symbol kind is required in order to
separate declaring part from statement part.

Hugues Cassé.

Alex Colvin wrote:
>
>
> This is usually left to the parser.
> you have to distinguish
> (int)*ptr
> (width)*height
> which requires knowing that "int" is a type (or typedef) in the current
> scope.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com