Home > Archive > Compilers > April 2004 > a token scanner DFA for indirection operator * ?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
a token scanner DFA for indirection operator * ?
|
|
|
| Hi:
I'm trying to write a token scanner for C, and I'm wondering if there is a
detrministic finite automata (DFA) or state machine for the '*'
indirection operator(IOP), or if differentiation between the multiply
operator and the IOP is better left to the parser. Right now my scanner
keeps track of the previous token and also has a lookahead char. My
biggest problem occurs when the previous token is a right parenthesis. In
this case, the parenthesis can enclolse:
1.an expression ( in which case '*' is the multiply operator )
2.a boolean expressian after an 'if' or 'while' ( in which case '*', if
followed by an alphabetic char or underscore, is an IOP )
3.the conditions for a 'for' statement.
I imagine that I could have a flag variable telling me whether the
previous right parenthesis token closed an arithmetic expression, but I
was wondering if I was overlooking something.
Is it possible to determine '*' with just the previous token and a
lookahead char, or do I need to introduce a flag?
thanks in advance
Ron Gesell
PS. I haven't looke at it yet, but I suspect the same situation can arise
with '&'.
[I think your life will be a lot easier if you interpret the tokens in
the parser, not the lexer. -John]
| |
| Alex Colvin 2004-04-03, 9:32 am |
| >I'm trying to write a token scanner for C, and I'm wondering if there is a
>detrministic finite automata (DFA) or state machine for the '*'
>indirection operator(IOP), or if differentiation between the multiply
>operator and the IOP is better left to the parser. Right now my scanner
This is usually left to the parser.
you have to distinguish
(int)*ptr
(width)*height
which requires knowing that "int" is a type (or typedef) in the current
scope.
--
mac the naïf
| |
| VBDis 2004-04-03, 10:36 am |
| RonG <rgesell@mb.sympatico.ca> schreibt:
>I'm trying to write a token scanner for C, and I'm wondering if there is a
>detrministic finite automata (DFA) or state machine for the '*'
>indirection operator(IOP), or if differentiation between the multiply
>operator and the IOP is better left to the parser.
Doesn't '*' have 3 meanings?
1. multiply
2. dereference
3. pointer (in declaration, cast-expression)
>Is it possible to determine '*' with just the previous token and a
>lookahead char, or do I need to introduce a flag?
For (3) the scanner must be able to distinguish between type names and other
identifiers.
>PS. I haven't looke at it yet, but I suspect the same situation can arise
>with '&'.
And with '++' and '--' as prefix or postfix operators, and '+' and '-' as unary
or binary operators...
>[I think your life will be a lot easier if you interpret the tokens in
>the parser, not the lexer. -John]
Agreed.
DoDi
| |
| Dmitry A. Kazakov 2004-04-03, 10:36 am |
| On 26 Mar 2004 22:38:44 -0500, RonG <rgesell@mb.sympatico.ca> wrote:
>I'm trying to write a token scanner for C, and I'm wondering if there is a
>detrministic finite automata (DFA) or state machine for the '*'
>indirection operator(IOP), or if differentiation between the multiply
>operator and the IOP is better left to the parser. Right now my scanner
>keeps track of the previous token and also has a lookahead char. My
>biggest problem occurs when the previous token is a right parenthesis. In
>this case, the parenthesis can enclolse:
>1.an expression ( in which case '*' is the multiply operator )
>2.a boolean expressian after an 'if' or 'while' ( in which case '*', if
> followed by an alphabetic char or underscore, is an IOP )
>3.the conditions for a 'for' statement.
>
>I imagine that I could have a flag variable telling me whether the
>previous right parenthesis token closed an arithmetic expression, but I
>was wondering if I was overlooking something.
>Is it possible to determine '*' with just the previous token and a
>lookahead char, or do I need to introduce a flag?
Whether "*" is an unary or dyadic operator depends on the context. You
can keep track on its switching:
prefix context ->
operand (= literal/identifier) ->
postfix context ->
infix context (dyadic operator, comma) ->
prefix context ->
....
When "*" appears in the prefix context, then it is an unary prefix
operator, when it does in the infix context, then it is a dyadic
operator and so on.
>thanks in advance
>Ron Gesell
>
>PS. I haven't looke at it yet, but I suspect the same situation can arise
>with '&'.
>[I think your life will be a lot easier if you interpret the tokens in
>the parser, not the lexer. -John]
Yes, when both lexer and parser are integrated in one.
--
Regards,
Dmitry Kazakov
www.dmitry-kazakov.de
| |
| Cassé Hugues 2004-04-15, 1:35 pm |
| Yes, there is a way. Please look at my own C front-end in OCAML
(http://casse.hugues.free.fr/projects/frontc.html). The "*" may be used
prefixed or infixed in C expression: not a problem for LR(1) parser. In
type definition, the "context" is very different and no problem arises.
Issues come from the declaration types using "typedef" names. In case
of blocks "{" ... "}", knowledge of symbol kind is required in order to
separate declaring part from statement part.
Hugues Cassé.
Alex Colvin wrote:
>
>
> This is usually left to the parser.
> you have to distinguish
> (int)*ptr
> (width)*height
> which requires knowing that "int" is a type (or typedef) in the current
> scope.
|
|
|
|
|