For Programmers: Free Programming Magazines  


Home > Archive > Smalltalk > April 2004 > Newbie: Scanner/Parser question









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Newbie: Scanner/Parser question
Michael Bielser

2004-04-22, 11:37 am

Hello
I am extending a language (called GAMS, syntax is a mix of C and
Fortran) for a research project (I don't have access to the
scanner/parser definition of GAMS, just some written docu).

In a first step, I would just like to scan/parse my additional langugae
elements. My problem: As soon as a token of the "base language" (=GAMS)
is encountered, I (obviously) get a "token not expected" error. How can
I skip everything (i.e. treat it like a kind of whitespace) other than
my new definitions?

Any help is appreciated!

Kind regards
Mike

P.S. This is my first venture into the scanner/parser realm, so please
bear with me even if I seem to ask trivial things.

Sascha Doerdelmann

2004-04-23, 9:33 am

Michael Bielser <mike.bielser@ior.unizh.ch> wrote:
> I am extending a language (called GAMS, syntax is a mix of C and
> Fortran) for a research project (I don't have access to the
> scanner/parser definition of GAMS, just some written docu).


Any reason why you use Smalltalk for that instead of an arbitrary parser generator?

Cheers
Sascha
Michael Bielser

2004-04-27, 4:11 am

Sascha Doerdelmann wrote:
> Michael Bielser <mike.bielser@ior.unizh.ch> wrote:
>
>
>
> Any reason why you use Smalltalk for that instead of an arbitrary parser generator?
>
> Cheers
> Sascha


Yes: The scanner/parser part is only one (small) part of the whole
project. There's GUI stuff to do, other background modules...
While I come with a C/C++ background, I didn't feel like being able to
achieve the set goals in the short time available. So I did some
evaluation/comparison (mainly Java, Smalltalk, Python) and ended up with
Smalltalk (I liked the pure object approach, the simple syntax and the
rich class libraries (I use VW 7.2 NC)).
Do you suggest I better abandon Smalltalk?

Regards
Mike

Chris Uppal

2004-04-27, 7:04 am

Xref: kermit comp.lang.smalltalk:75840

Michael Bielser wrote:

> Yes: The scanner/parser part is only one (small) part of the whole
> project. There's GUI stuff to do, other background modules...
> While I come with a C/C++ background, I didn't feel like being able to
> achieve the set goals in the short time available. So I did some
> evaluation/comparison (mainly Java, Smalltalk, Python) and ended up with
> Smalltalk (I liked the pure object approach, the simple syntax and the
> rich class libraries (I use VW 7.2 NC)).
> Do you suggest I better abandon Smalltalk?


If I understand you correctly (and I may not, I'm a bit fuddled about what you
are really asking), you are wanting to do some parsing of a different language
(GAMS?) from your program that will be written in Smalltalk, or C+, or
whatever.

If that's right, then you'll need to write the parser/scanner in whatever
language you choose to use (there is no "magical" extra ability to write
parsers/scanners in Smalltalk compared with any other language). Or, better,
use an automated parser/scanner generator. Such generators are available for
all the languages you mention, and they all work (from the programmer's point
of view) in very much the same way.

I haven't used any of the Smalltalk parser generators (in fact I've only ever
used the Yacc[Bison]/Lex[flex] combo from C). But if I were doing a parser
today, then I think I'd start with the most recent of the generators, SmaCC,
from the Refactory <http://www.refactory.com/Software/SmaCC/>. There are
others.

You /may/ need to learn more about parsing in general before you can make
effective use of the tool, but there are many books and tutorials around (they
will typically not refer to any particular tool, or to a tool other than SmaCC,
but that shouldn't matter much -- as I said, these tools are all quite
similar).

-- chris


Sascha Doerdelmann

2004-04-27, 7:47 am

Michael Bielser <mike.bielser@ior.unizh.ch> wrote:

[ reason why use Smalltalk instead of arbitrary parser generator ]
> Yes: The scanner/parser part is only one (small) part of the whole
> project. There's GUI stuff to do, other background modules...
> While I come with a C/C++ background, I didn't feel like being able to
> achieve the set goals in the short time available. So I did some
> evaluation/comparison (mainly Java, Smalltalk, Python) and ended up with
> Smalltalk (I liked the pure object approach, the simple syntax and the
> rich class libraries (I use VW 7.2 NC)).
> Do you suggest I better abandon Smalltalk?


Not at all. Welcome to Smalltalk!

I am not an expert in the field but my advice is:

1. Keep in mind that the framework is mainly designed for parsing
Smalltalk. Don't just reuse the "Scanner" as is, e. g. provide a new
ScannerTable. It might be easier to just copy aspects of the design
instead of using the classes.

2. Have a look at as many as the availiable parsers as possible. To
start with: Have a look at implementations of the Refactoring Browser,
the Regex parcel and the XML stuff.

Cheers
Sascha
Lex Spoon

2004-04-27, 3:11 pm

"Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> writes:
> I haven't used any of the Smalltalk parser generators (in fact I've only ever
> used the Yacc[Bison]/Lex[flex] combo from C). But if I were doing a parser
> today, then I think I'd start with the most recent of the generators, SmaCC,
> from the Refactory <http://www.refactory.com/Software/SmaCC/>. There are
> others.


Well, SmaCC (ne T-Gen) is unusual in having a nice GUI for developing
your grammar. It has tabs for entring little test strings to tokenize
and/or parse and/or parse with *your* AST nodes instead of generic AST
notes supplied in the library. It also lets you choose variant
algorithms right in the tool.

I recommend that language people take a look at SmaCC at some point
just to see what is possible in this kind of tool.

-Lex

Michael Bielser

2004-04-28, 2:33 am

Let me summarize:
- Yes, I have to parse GAMS programs within my program (that could be
built in any "general" language like C++, Java,... I decided to use
Smalltalk; see my comments why)
- And no, you can't build the parser in GAMS itself (it's not a general
programming language, but a very specialized modeling language)
- I know there is no "magical" extra ability to write parsers/scanners
in Smalltalk compared with any other language (again, see my earlier
comments why Smalltalk...)
- Yes, I am using an automated parser/scanner generator, namely SmaCC.
- I also compared some automated parser/scanner generators available in
different languages (mainly in C, C++, Java and Python) and SmaCC is by
far the most easiest to use with the fastest turnaround time (from a non
parser/compiler specialist point of view).

Basically, my question was rather a general parser question (How can I
skip a few lines of the source to parse?). I asked here because I'm
neither a Smalltalk expert nor did I previously use SmaCC (and I, maybe
falsely, assumed that there might be some Smalltalk/SmaCC peculiarities
I had tripped upon, as newcomer).

I'm sorry for any confusion caused; thanks for all the comments.

Regards
Mike

Eliot Miranda

2004-04-28, 1:53 pm



Michael Bielser wrote:
> Let me summarize:
> - Yes, I have to parse GAMS programs within my program (that could be
> built in any "general" language like C++, Java,... I decided to use
> Smalltalk; see my comments why)
> - And no, you can't build the parser in GAMS itself (it's not a general
> programming language, but a very specialized modeling language)
> - I know there is no "magical" extra ability to write parsers/scanners
> in Smalltalk compared with any other language (again, see my earlier
> comments why Smalltalk...)
> - Yes, I am using an automated parser/scanner generator, namely SmaCC.
> - I also compared some automated parser/scanner generators available in
> different languages (mainly in C, C++, Java and Python) and SmaCC is by
> far the most easiest to use with the fastest turnaround time (from a non
> parser/compiler specialist point of view).
>
> Basically, my question was rather a general parser question (How can I
> skip a few lines of the source to parse?). I asked here because I'm
> neither a Smalltalk expert nor did I previously use SmaCC (and I, maybe
> falsely, assumed that there might be some Smalltalk/SmaCC peculiarities
> I had tripped upon, as newcomer).


Well, I'm not sure this is the answer you want to hear but Smalltalk
parsers typically take their input from a stream (some general instance
of ReadStream) and that stream can be a subsequence of some underlying
sequence. For example, ReadStream on: someString from: 100 to: 199
will provide a stream on 100 characters of someString from index 100
through 199. The parser doesn't need to know the details of the actual
input; this can be managed by/abstracted away by the input stream.

You might want to look at how the XML and chunk format parsers provide
source to be compiled, see
ChunkSourceFileFormat>>fileInFrom:
XMLSourceFileFormat>>fileInFrom:
and how the text editor does it, see
ParagraphEditor>>selectionAsStream
ParagraphEditor>>compileSelection

HTH

>
> I'm sorry for any confusion caused; thanks for all the comments.
>
> Regards
> Mike
>


--
_______________,,,^..^,,,____________________________
Eliot Miranda Smalltalk - Scene not herd

John Brant

2004-04-28, 5:41 pm

Michael Bielser wrote:

> Basically, my question was rather a general parser question (How can I
> skip a few lines of the source to parse?). I asked here because I'm
> neither a Smalltalk expert nor did I previously use SmaCC (and I, maybe
> falsely, assumed that there might be some Smalltalk/SmaCC peculiarities
> I had tripped upon, as newcomer).


If you want to skip the first couple of lines, it might be easiest to
first skip the lines in the stream and then parse the rest of the stream
using the SmaCCParser class>>parseStream: message instead of the #parse:
message.

If the lines are intermixed with other code you want to parse, you may
want to create a special token in the scanner that ignores the lines
(e.g., see the <whitespace> and <comment> tokens in the StScanner
definition). If you can't ignore tokens since they are also used in
stuff you want to parse, then you'll probably need to modify your
grammar with a production that ignores tokens when it enters some state.

Hope this helps. If you still need help, you may wish to post an example
of what you are wanting.


John Brant
Mark van Gulik

2004-04-29, 9:26 pm

Michael Bielser <mike.bielser@ior.unizh.ch> wrote in message news:<408F42F9.7020501@ior.unizh.ch>...
> Let me summarize:

[...]
> Basically, my question was rather a general parser question (How can I
> skip a few lines of the source to parse?). I asked here because I'm
> neither a Smalltalk expert nor did I previously use SmaCC (and I, maybe
> falsely, assumed that there might be some Smalltalk/SmaCC peculiarities
> I had tripped upon, as newcomer).

[...]

Let us know if this makes sense: Write a preprocessor for the
language that extracts just the GAMS information into two separate
files: GAMS-only and GAMS-free. Then process the GAMS-only file
using GAMS syntax, and feed the GAMS-free file to the other language's
parser/compiler. It might be really easy to extract the GAMS part of
the program if there is some way of identifying it (brace brackets,
comments that start with a special symbol, etc). This is pretty much
the approach commonly used with embedded SQL. Well, actually, the
preprocessor takes a "marked up" program and converts it into a
program in which the markups have been rewritten as stylized
expressions of the base language.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com