For Programmers: Free Programming Magazines  


Home > Archive > Scheme > April 2006 > What is the fastest method of parsing scheme?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author What is the fastest method of parsing scheme?
Adrian

2006-04-07, 8:03 am

Hi all,

People have been writing scheme parsers for decades now, and I was
wondering if there is a consensus on what is the fastest and/or most
efficient method to parse scheme? What method, theoretical or
practical, leads to the fastest scheme parsers?

thx
- Adrian B

William D Clinger

2006-04-07, 7:04 pm

> People have been writing scheme parsers for decades now, and I was
> wondering if there is a consensus on what is the fastest and/or most
> efficient method to parse scheme? What method, theoretical or
> practical, leads to the fastest scheme parsers?


I can't think of anything that's likely to run faster than the
READ procedure.

Will

Adrian

2006-04-07, 7:04 pm


Maybe I should clarify... What I am interested in is what years of
experience have taught the development community about the
*implementation* of fast scheme parsers (but not anything beyond that
such as interpretation). Is there a generally accepted
performance-optimal method for parsing the language? What
implementation-tricks can be used to speed up parsing? etc. I am
thinking of this more from an implementation perspective using a
language like C or C++, but not specifically.

Cheers,
Adrian

Jens Axel Søgaard

2006-04-07, 7:04 pm

Adrian wrote:
> Maybe I should clarify... What I am interested in is what years of
> experience have taught the development community about the
> *implementation* of fast scheme parsers (but not anything beyond that
> such as interpretation). Is there a generally accepted
> performance-optimal method for parsing the language? What
> implementation-tricks can be used to speed up parsing? etc. I am
> thinking of this more from an implementation perspective using a
> language like C or C++, but not specifically.


The grammar of Scheme is quite straightforward. Therefore: Use a
parser generator. It is seldom worthwhile to write your own
parser.

--
Jens Axel Søgaard

William D Clinger

2006-04-07, 7:04 pm

Adrian wrote:
> Maybe I should clarify... What I am interested in is what years of
> experience have taught the development community about the
> *implementation* of fast scheme parsers (but not anything beyond that
> such as interpretation). Is there a generally accepted
> performance-optimal method for parsing the language?


The parsing problem for Scheme is Omega(n), where
n is the number of characters in the input. From
that it follows that any O(n) parsing algorithm
is within a constant factor of the best possible
performance. In most implementations, the read
procedure is implemented by a recursive descent
parser using one character of lookahead, although
a second character of lookahead may be used in a
few places.

> What
> implementation-tricks can be used to speed up parsing? etc. I am
> thinking of this more from an implementation perspective using a
> language like C or C++, but not specifically.


I think the most important trick is to write the
parser in Scheme instead of C or C++.

Consider the following benchmark results, obtained
on a SunBlade 1500 with no other users. All times
are in seconds, and are the average of three runs:

Chez Larceny MzScheme scheme48
******** ******** ******** ********
benchmark1 0.247 0.569 2.587 2.62
benchmark2 0.330 1.653 1.821 3.82

benchmark1 just uses read-char to read "nboyer.sch"
100 times, with no parsing at all. benchmark2 just
uses read to parse "nboyer.sch" 100 times.

In the two fastest systems, the read procedure is
written in Scheme. It therefore appears that the
most important trick, if you're after speed, is to
write the parser in Scheme instead of C or C++.

This trick doesn't always work, however, as shown
by the timing for scheme48. I suspect that scheme48's
read procedure is written in Scheme also, but it's
the slowest of the bunch.

MzScheme's read procedure is written in C or C++,
and runs faster than you can read the individual
characters from the file. That indicates a problem
with the performance of read-char in MzScheme; that
problem was one of the reasons they chose to write
the read procedure in C or C++ instead of Scheme.

Will

Ray Blaak

2006-04-07, 7:04 pm

Jens Axel Søgaard <usenet@soegaard.net> writes:
> The grammar of Scheme is quite straightforward. Therefore: Use a
> parser generator. It is seldom worthwhile to write your own
> parser.


I haven't parsed scheme before, but I have parsed enough C, Java, Pascal, and
XML to know that for me it is always worthwhile to write your own parser. Not
from scratch, of course, one needs to have a suitable toolkit to build on,
essentially a decent tokenizer library.

The point though is that I have found the hard way that the effort of
specifying actions for grammar rules and providing the glue code to actually
use the parsing results is not much different from simply doing it all
yourself.

Properly designed parser code will have the appropriate methods that
correspond to the equivalent grammar elements, the result being that the
parsing code is just as readable as a grammar file itself.

There is the added benefit of flexiblity. Need n-token look-ahead for a
particular element? Something that is not quite context independent? Code
the custom work on the spot -- no tool limitations to deal with.

Versioning and licensing problems in using 3rd party parsing tools also just
go away.

The result is something that is as lightweight and flexible as I need it to
be, something that works perfectly with anything else I use since I have
complete control to change anything as needed.

--
Cheers, The Rhythm is around me,
The Rhythm has control.
Ray Blaak The Rhythm is inside me,
rAYblaaK@STRIPCAPStelus.net The Rhythm has my soul.
Abdulaziz Ghuloum

2006-04-07, 7:04 pm

William D Clinger wrote:

> In the two fastest systems, the read procedure is
> written in Scheme. It therefore appears that the
> most important trick, if you're after speed, is to
> write the parser in Scheme instead of C or C++.


And pick a decent Scheme compiler, yes.

> MzScheme's read procedure is written in C or C++,
> and runs faster than you can read the individual
> characters from the file. That indicates a problem
> with the performance of read-char in MzScheme; that
> problem was one of the reasons they chose to write
> the read procedure in C or C++ instead of Scheme.


But shouldn't we all code in C for efficiency? ;)

Aziz,,,
William D Clinger

2006-04-07, 10:02 pm

> But shouldn't we all code in C for efficiency? ;)
>
> Aziz,,,


Of course. Even freshmen know that!

Will

Ray Dillinger

2006-04-08, 4:01 am

Adrian wrote:
> Hi all,
>
> People have been writing scheme parsers for decades now, and I was
> wondering if there is a consensus on what is the fastest and/or most
> efficient method to parse scheme? What method, theoretical or
> practical, leads to the fastest scheme parsers?
>
> thx
> - Adrian B



Use Flex or similar to generate a lexer. Parsing is profoundly
easy in Lispy languages, and besides you want some ability to
generate, interpret, compile, or dynamic-load code at runtime
which complicates parser toolkits immensely, so it's not IMO
worthwhile to use a toolkit such as YACC to build a parser; it
was easier and simpler to make my own by hand.

Bear

Abdulaziz Ghuloum

2006-04-08, 7:02 pm

William D Clinger wrote:
> Chez Larceny MzScheme scheme48
> ******** ******** ******** ********
> benchmark1 0.247 0.569 2.587 2.62
> benchmark2 0.330 1.653 1.821 3.82
>
> benchmark1 just uses read-char to read "nboyer.sch"
> 100 times, with no parsing at all. benchmark2 just
> uses read to parse "nboyer.sch" 100 times.
>
> In the two fastest systems, the read procedure is
> written in Scheme. It therefore appears that the
> most important trick, if you're after speed, is to
> write the parser in Scheme instead of C or C++.



In Chez 6.9c, on my iBook (800MHz G4), I get:
(time (many test-read ...))
2 collections
290 ms elapsed cpu time, including 10 ms collecting
405 ms elapsed real time, including 2 ms collecting
2665248 bytes allocated, including 2240992 bytes reclaimed
(time (many test-read-char ...))
no collections
80 ms elapsed cpu time
182 ms elapsed real time
108000 bytes allocated

I think Chez open-codes read-char at optimize-level 3. This
makes Chez, Larceny, and also my little compiler perform
read-char about only 3~4 times faster than read (for nboyer.sch,
100 times).

With the following numbers (if there is a way to scale them by
having Chez's times as the baseline), I would probably say the
3 fastest systems have their read implemented in Scheme, which
confirms your hypothesis.

(time (many test-read ...))
performed 6 collections
utime: 0.980s including gc: 0.010s
stime: 0.040s including gc: 0.000s
total: 1.020s including gc: 0.010s
rtime: 1.181s including gc: 0.018s

(time (many test-read-char ...))
performed 0 collections
utime: 0.210s including gc: 0.000s
stime: 0.030s including gc: 0.000s
total: 0.240s including gc: 0.000s
rtime: 0.315s including gc: 0.000s

Aziz,,,
Abdulaziz Ghuloum

2006-04-08, 7:02 pm

Adrian wrote:

> Maybe I should clarify... What I am interested in is what years of
> experience have taught the development community about the
> *implementation* of fast scheme parsers (but not anything beyond that
> such as interpretation). Is there a generally accepted
> performance-optimal method for parsing the language? What
> implementation-tricks can be used to speed up parsing? etc. I am
> thinking of this more from an implementation perspective using a
> language like C or C++, but not specifically.
>
> Cheers,
> Adrian


Reading scheme involves two procedures: a lexer and a parser. The lexer
turns the stream of characters into tokens (such as left-paren, the
integer 42, the string "foobar", #f, etc.). The parser consumes these
tokens one at a time and assembles them into a (possibly nested) list.

When I tried implementing a reader for Scheme, I first wrote the lexer
in the most straight-forward fashion you can think of. Here are the
first few lines of the main lexer procedure. Essentially, it's a state
machine implemented using many mutually recursive procedures. For
example, when the lexer finds the double-quote character, it calls the
Str helper, which consumes all the characters up to the closing
double-quote (taking care of escape characters), and finally returns
(datum . "some string") for the parser to use. The entire lexer is
about 300 lines of Scheme.

(define lexer
(lambda (p)
(let ([c (read-char p)])
(cond
[(eof-object? c) 'eof]
[(char=? c #\() 'lp]
[(char=? c #\)) 'rp]
[(char=? c #\[) 'lb]
[(char=? c #\]) 'rb]
[(char-whitespace? c) (lexer p)]
[(char-numeric? c) (Num p (cons c '()))]
[(sign-char? c) (Sign p c)]
[(char=? c #\") (Str p '())]
[(char=? c #\;) (Semi p)]
[(char=? c #\.) (Dot p)]
[(char=? c #') 'quote]
[(char=? c #\`) 'quasiquote]
[(char=? c #\,) (Unq p)]
[(char=? c #\#) (Hash p)]
[(char-symbolic? c) (Sym p (cons c '()))]
...

The parser is much simpler than the lexer because it does not have to
deal with all the special cases of the syntax. (ie. there are a zillion
things that start with # in my reader: #t, #f, #!eof, #(), #[], #23(23),
#\n, #\newline, #1=(#1#), #%prim, #2%prim, #3%prim, etc. I don't have
the full numeric tower so I don't deal with all the quirks of parsing
numbers yet). So, the parser is very small (around 80 lines of Scheme)
and very easy to write. Overall, the whole reader business is very
straightforward to do, and makes up very small percentage of any
implementation (less than 1%). I believe this is the fastest and
simplest way to parse scheme. Here are the first few lines of my
parser:

(define parse
(lambda (p)
(let ([t (get-token p)])
(case t
[(eof) '#!eof]
[(lp) (parse-list p)]
[(lb) (parse-list-b p)]
[(dot) (error 'read "unexpected dot")]
[(quote) (list 'quote (parse p))]
[(hash-quote) (list 'syntax (parse p))]
[(quasiquote) (list 'quasiquote (parse p))]
[(unquote) (list 'unquote (parse p))]
[(unquote-splicing) (list 'unquote-splicing (parse p))]
...

I hope this helps.

Aziz,,,
Jens Axel Søgaard

2006-04-08, 7:02 pm

Abdulaziz Ghuloum wrote:
> Adrian wrote:
>
>
> Reading scheme involves two procedures: a lexer and a parser. The lexer
> turns the stream of characters into tokens (such as left-paren, the
> integer 42, the string "foobar", #f, etc.). The parser consumes these
> tokens one at a time and assembles them into a (possibly nested) list.
>
> When I tried implementing a reader for Scheme, I first wrote the lexer
> in the most straight-forward fashion you can think of. Here are the
> first few lines of the main lexer procedure. Essentially, it's a state
> machine implemented using many mutually recursive procedures. For
> example, when the lexer finds the double-quote character, it calls the
> Str helper, which consumes all the characters up to the closing
> double-quote (taking care of escape characters), and finally returns
> (datum . "some string") for the parser to use. The entire lexer is
> about 300 lines of Scheme.
>
> (define lexer
> (lambda (p)
> (let ([c (read-char p)])
> (cond
> [(eof-object? c) 'eof]
> [(char=? c #\() 'lp]


I suddenly remember the perfect place to start for Adrian.

The Summer '96 Scheme Workshop were on compilers. Eric Hilsdale and
Christopher Haynes were the instructors.

The Scanner/parser is described in
<http://www.cs.indiana.edu/eip/compile/scanparse.html>

The main page is here:
<http://www.cs.indiana.edu/eip/compile/>

The road map is here:
<http://www.cs.indiana.edu/eip/compile/roadmap.html>

And the coomplete code is here:
<http://www.cs.indiana.edu/eip/compile/code.html>

Note that the compiler produces sparc assembler - so they naturally
included a Sparc emulator written in Scheme...


Related papers:

"Destination-Driven Code Generation"
R. Kent Dybvig, Robert Hieb and Tom Butler.
<http://www.cs.indiana.edu/~dyb/papers/ddcg.ps>

"Compiler Construction Using Scheme".
Erik Hilsdale, J. Michael Ashley, R. Kent Dybvig and Daniel P. Friedman.
<http://repository.readscheme.org/ft...hley/fple95.pdf>


--
Jens Axel Søgaard
Abdulaziz Ghuloum

2006-04-08, 7:02 pm

Jens Axel Søgaard wrote:

> I suddenly remember the perfect place to start for Adrian.
>
> The Summer '96 Scheme Workshop were on compilers. Eric Hilsdale and
> Christopher Haynes were the instructors.
>
> The Scanner/parser is described in
> <http://www.cs.indiana.edu/eip/compile/scanparse.html>
>
> The main page is here:
> <http://www.cs.indiana.edu/eip/compile/>
>
> The road map is here:
> <http://www.cs.indiana.edu/eip/compile/roadmap.html>
>
> And the coomplete code is here:
> <http://www.cs.indiana.edu/eip/compile/code.html>
>
> Note that the compiler produces sparc assembler - so they naturally
> included a Sparc emulator written in Scheme...


This is kind of, ..., what you call it, freaky. I have never seen these
pages before yet the code looks almost identical to the code I have in
my compiler. There must be only one way to write Scheme parsers in
Scheme. So, I agree. It's the perfect place to start for Adrian.



> Related papers:
>
> "Destination-Driven Code Generation"
> R. Kent Dybvig, Robert Hieb and Tom Butler.
> <http://www.cs.indiana.edu/~dyb/papers/ddcg.ps>
>
> "Compiler Construction Using Scheme".
> Erik Hilsdale, J. Michael Ashley, R. Kent Dybvig and Daniel P. Friedman.
> <http://repository.readscheme.org/ft...hley/fple95.pdf>


These I've seen, and they are very good papers to read. Just for the
record, the current undergraduate compilers course in IU extends the
5-pass compiler described in the second paper to a 50-pass compiler
that performs many optimizations and uses the graph-coloring register
allocator instead of the destination-driven code generator. The list
of passes making the "P423-compiler" is in:

"A Nanopass Framework for Compiler Education"
Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig.
Journal of Functional Programming -- Educational Pearl. (To appear.)
http://www.cs.indiana.edu/~dyb/pubs/nano-jfp.pdf

Aziz,,,
Eli Barzilay

2006-04-08, 7:02 pm

"William D Clinger" <cesura17@yahoo.com> writes:

> [...]
> Consider the following benchmark results, obtained
> on a SunBlade 1500 with no other users. All times
> are in seconds, and are the average of three runs:
>
> Chez Larceny MzScheme scheme48
> ******** ******** ******** ********
> benchmark1 0.247 0.569 2.587 2.62
> benchmark2 0.330 1.653 1.821 3.82
>
> benchmark1 just uses read-char to read "nboyer.sch"
> 100 times, with no parsing at all. benchmark2 just
> uses read to parse "nboyer.sch" 100 times.


Comparing them is only useful if they all do the same thing, which is
wrong in this case. The final product of read/read-char/etc would be
the same, but making the process of getting that result deal with
things like non-blocking IO and Unicode can add enough noise to make
these numbers useless.

On top of that, the actual act of reading a character is something
that you must take all the way down to the OS level -- so I tried a
"raw" implementation of benchmark1: a C program that uses system calls
to read the file. The numbers I got (Linux on an Intel machine, 200
reads in each run, median of three runs) are 1.25sec for `read-char',
and 4.27sec for the system call version (with about 25% spent on user
code). So you obviously need some form of buffering, which adds an
additional layer of noise on the numbers. (BTW, benchmark2 runs
slightly slower for me: 1.28sec.)

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Lauri Alanko

2006-04-08, 7:02 pm

In article <e18nge$lim$1@rainier.uits.indiana.edu>,
Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> wrote:
> "A Nanopass Framework for Compiler Education"
> Dipanwita Sarkar, Oscar Waddell, and R. Kent Dybvig.
> Journal of Functional Programming -- Educational Pearl. (To appear.)


It appeared in 15(5):653-668.

I'm still anxiously waiting for them to release the code...


Lauri
Jens Axel Søgaard

2006-04-08, 7:02 pm

Abdulaziz Ghuloum wrote:
> Jens Axel Søgaard wrote:
>
[color=darkred]
> This is kind of, ..., what you call it, freaky. I have never seen these
> pages before yet the code looks almost identical to the code I have in
> my compiler. There must be only one way to write Scheme parsers in
> Scheme. So, I agree. It's the perfect place to start for Adrian.


;-)

>
> These I've seen, and they are very good papers to read.


Indeed. There were several ideas in the workshop code that fell
into place when I read the "Destination-Driven Code Generation"
paper.

--
Jens Axel Søgaard
Abdulaziz Ghuloum

2006-04-08, 7:02 pm

Eli Barzilay wrote:

> "William D Clinger" <cesura17@yahoo.com> writes:
>
>
>
>
> Comparing them is only useful if they all do the same thing, which is
> wrong in this case. The final product of read/read-char/etc would be
> the same,


And that's what counts for the user: the final product.

> but making the process of getting that result deal with
> things like non-blocking IO and Unicode can add enough noise to make
> these numbers useless.


That's an implementation detail. The end user doesn't care how you get
the chars. I don't think Unicode adds 5~10-fold overhead but I could be
wrong.

> On top of that, the actual act of reading a character is something
> that you must take all the way down to the OS level


Details and not true for buffered IO.

> -- so I tried a
> "raw" implementation of benchmark1: a C program that uses system calls
> to read the file. The numbers I got (Linux on an Intel machine, 200
> reads in each run, median of three runs) are 1.25sec for `read-char',
> and 4.27sec for the system call version (with about 25% spent on user
> code).


Are you saying that C makes a terrible language to implement IO? No
wonder then that the faster implementations do their IO in Scheme.

> So you obviously need some form of buffering, which adds an
> additional layer of noise on the numbers. (BTW, benchmark2 runs
> slightly slower for me: 1.28sec.)


Yes, buffering makes IO faster. Doesn't MzScheme use buffered IO?

Aziz,,,
Eli Barzilay

2006-04-08, 7:02 pm

Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> Eli Barzilay wrote:
>
>
>
> And that's what counts for the user: the final product.


Depends on your definition of "final product", and according to the
definition I used, then what counts is much more than the final
product. For example, I might care that the rest of the system
provides certain features even if `read-char' is slow as a result. I
might care about the behavior of the system if `read-char' is used on
an input terminal which is waiting for input (= if it busy-waits, I
don't want it). This whole thing about measuring benchmarks shows
that Will cares about the speed too.


>
> That's an implementation detail. The end user doesn't care how you
> get the chars. [...]


Same as above: some users defeinitely care that you *do* get unicode
chars.


>
> Are you saying that C makes a terrible language to implement IO? No
> wonder then that the faster implementations do their IO in Scheme.


You know, I'm all for Scheme propaganda, but this is plain silly. I
wrote some trivial C code to measure char-by-char reading speed. It
took more than 4 seconds, and 3 of that was spent in system time.
That was my point.


>
> Yes, buffering makes IO faster. Doesn't MzScheme use buffered IO?


Of course -- using read-char was faster than the raw system-calls
code. Since all Scheme's have a faster than system-call `read-char',
then they all use some buffering. This means that there is another
layer of dealing with IO before `read-char' -- that layer can be
implemented in very different ways (what kind of buffering? how big?
what system call is used to read/write chunks?). That layer is
probably different in different implementation, which means more noise
that in this case can be large enough to render the original measures
meaningless.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
William D Clinger

2006-04-08, 7:02 pm

Eli Barzilay quoting Abdulaziz Ghuloum:
>
> Same as above: some users defeinitely care that you *do* get unicode
> chars.


Let's get real. Do you honestly believe that Unicode is the
reason MzScheme's read-char is so slow?

I'll give you a hint: MzScheme's read-char is about as fast
as scheme48's read-char.

> You know, I'm all for Scheme propaganda, but this is plain silly.


We're having fun here, but our point is that it is plain
silly to think you can improve the speed of a parser by
writing it in C or C++. A secondary point is that the
main reason freshpeople *think* writing it in C or C++
is faster is because they are accustomed to comparing the
speed of compiled C/C++ against interpreted Scheme.

> Of course -- using read-char was faster than the raw system-calls
> code. Since all Scheme's have a faster than system-call `read-char',
> then they all use some buffering. This means that there is another
> layer of dealing with IO before `read-char' -- that layer can be
> implemented in very different ways (what kind of buffering? how big?
> what system call is used to read/write chunks?). That layer is
> probably different in different implementation, which means more noise
> that in this case can be large enough to render the original measures
> meaningless.


Do you honestly believe MzScheme's read-char is slow
because of poor buffering?

Will

Lauri Alanko

2006-04-08, 7:02 pm

In article <44368002$0$38672$edfadb0f@dread12.news.tele.dk>,
Jens Axel Søgaard <usenet@soegaard.net> wrote:
> The grammar of Scheme is quite straightforward. Therefore: Use a
> parser generator. It is seldom worthwhile to write your own
> parser.


Parser generators often produce bottom-up parsers, which are pretty
difficult to modify on the fly. So if run-time reader extensions are
desired, then a custom recursive-descent parser makes more sense,
especially since the s-expression grammar is (easiry transformed into)
LL(1).

Of course an extensible LL(1) parser could be produced by a generator,
too, but I don't think such tools are very common.


Lauri
Abdulaziz Ghuloum

2006-04-08, 7:02 pm

Eli Barzilay wrote:

> Depends on your definition of "final product", and according to the
> definition I used, then what counts is much more than the final
> product. For example, I might care that the rest of the system
> provides certain features even if `read-char' is slow as a result. I
> might care about the behavior of the system if `read-char' is used on
> an input terminal which is waiting for input (= if it busy-waits, I
> don't want it). This whole thing about measuring benchmarks shows
> that Will cares about the speed too.


So, how much of the overhead of read-char is due to using non-blocking
IO in MzScheme?

>
> Same as above: some users defeinitely care that you *do* get unicode
> chars.


How much does Unicode support cost read-char? I honestly would like to
know.


>
> You know, I'm all for Scheme propaganda, but this is plain silly. I
> wrote some trivial C code to measure char-by-char reading speed. It
> took more than 4 seconds, and 3 of that was spent in system time.
> That was my point.


I was not talking about ``Oh Scheme is the best/fastest language''. I'm
not for any propaganda. It's just that in my experience, I can squeeze
more performance out of Scheme when given a decent implementation than I
can using C. Consequently, I was agreeing with Will that implementing a
Scheme parser in Scheme is the key to getting good performance.

I still don't know what your point is regarding the trivial "raw" C code
that you wrote that spends 75% of its time in the system. My compiler
spends 15% of its time in the system. Yes, it's a different system, but
what's your point? You are not blaming the system for spending that
time, are you?

>
> Of course -- using read-char was faster than the raw system-calls
> code. Since all Scheme's have a faster than system-call `read-char',
> then they all use some buffering. This means that there is another
> layer of dealing with IO before `read-char' -- that layer can be
> implemented in very different ways (what kind of buffering? how big?
> what system call is used to read/write chunks?). That layer is
> probably different in different implementation, which means more noise
> that in this case can be large enough to render the original measures
> meaningless.


Maybe I'm very slow today. Are you saying that if one system spends
0.247 secs reading chars and another spends 2.587 secs, then you cannot
say anything about their performance and the whole difference in the
measurement is noise?

Aziz,,,
Eli Barzilay

2006-04-08, 7:02 pm

"William D Clinger" <cesura17@yahoo.com> writes:

> Let's get real. Do you honestly believe that Unicode is the reason
> MzScheme's read-char is so slow?


No.


>
> We're having fun here, but our point is that it is plain silly to
> think you can improve the speed of a parser by writing it in C or
> C++.


I never said anything about parsers that use C/++ vs ones that use
Scheme. I only said that comparing IO behavior from implementations
that have wildly varying capabilities does not make much of a point.


> A secondary point is that the main reason freshpeople *think*
> writing it in C or C++ is faster is because they are accustomed to
> comparing the speed of compiled C/C++ against interpreted Scheme.


That's sort of right (I don't think that there are any major Scheme
implementation alive that do straightforward interpretation), but in
any case I had no argument in that direction. (And if I would make
any point in that area, then it should be clear what I'd favor.)


>
> Do you honestly believe MzScheme's read-char is slow because of poor
> buffering?


Not directly, but I believe that the IO capabilities I get in MzScheme
are rich enough that comparing it's `read-char' is not fair. First of
all, I get enough higher-level primitives that I think it has been
years since I used `read-char'. Secondly, even if doing some things
in C makes them faster, I'd still go with Scheme, and the exact same
argument holds for choosing a Scheme implementation -- there are some
things that make me choose an implementation even if it's slower.


Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> So, how much of the overhead of read-char is due to using
> non-blocking IO in MzScheme?


I really don't know. The IO part of MzScheme is exceptionally hairy
(esp given that it works on different OSs). The only way I can think
of doing a fair comparison is to use the same underlying IO
implementation, and that's way too much work. (I actually made a
brief attempt with a Scheme interface for reading characters through a
raw system call, and then I discovered the horrendous time spent when
you do that.)


>
> How much does Unicode support cost read-char? I honestly would like
> to know.


That's easy to try -- you just need to compare `read-char' and
`read-byte'. IIRC, the extra cost was indeed negligible, but like I
said above I just brought it as an example that a naive time
comparison is not saying much. (In any case, `read-char' is not as
useful when you have what you get with MzScheme.)


> I was not talking about ``Oh Scheme is the best/fastest language''.
> I'm not for any propaganda. It's just that in my experience, I can
> squeeze more performance out of Scheme when given a decent
> implementation than I can using C. Consequently, I was agreeing
> with Will that implementing a Scheme parser in Scheme is the key to
> getting good performance.


I'm really not convinced about that, certainly not by the posted
numbers. A bad Scheme programmer is still a bad programmer and will
generate a crappy parser, and a good C programmer is still a good C
programmer and will generate a good parser. For a fair comparison,
I'd sit down and go over a Scheme parser, and then I'd translate the
code to C -- that can get to a proper answer for why a Scheme faster
should be faster than a C parser, or to comparable C code that is
natural enough to have been used in the first place.


> I still don't know what your point is regarding the trivial "raw" C
> code that you wrote that spends 75% of its time in the system. My
> compiler spends 15% of its time in the system. Yes, it's a
> different system, but what's your point? You are not blaming the
> system for spending that time, are you?


No, I'm saying that there must be some buffering, and that buffering
will have a huge effect on the time, and since each implementation is
likely to use a different buffering the noise that is inserted to
timing comparisons makes it useless.


> Maybe I'm very slow today. Are you saying that if one system spends
> 0.247 secs reading chars and another spends 2.587 secs, then you cannot
> say anything about their performance and the whole difference in the
> measurement is noise?


The numbers were 0.330, 1.653, 1.821, 3.82, and yes, I'm saying that
using these numbers to draw such strong conclusions as "Scheme parsers
are faster than C parsers" is bogus.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Adrian

2006-04-08, 10:03 pm


Jens Axel S=F8gaard wrote:

> "Compiler Construction Using Scheme".
> Erik Hilsdale, J. Michael Ashley, R. Kent Dybvig and Daniel P. Friedman.
> <http://repository.readscheme.org/ft...hley/fple95.pdf>



Thanks for the links! I've been reading through some of these today
with a focus on scanning and parsing rather than code generation. The
above paper specifies a scanner using a deterministic finite automata
and a recursive descent parser. Are these considered the fastest way
to scan and parse scheme code? Or are they used in the paper simply
because they are either the simplest, or most elegant and
instructional?

Adrian

2006-04-08, 10:03 pm

Putting aside the argument over implementation language, is there a way
we can measure the performance of scanning and parsing methods
independent of the underlying IO implementations? My original question
was driven by an interest in what people beleive are the optimal
scanning and parsing techniques independent of other factors such as
interpretation, code generation, etc.

Abdulaziz Ghuloum

2006-04-08, 10:03 pm

Eli Barzilay wrote:
> ...


I guess you're not getting the clues, so, let me explain.

MzScheme is not slow because of Unicode. MzScheme is not slow because
of bad IO buffers management. MzScheme is not slow because it targets
many platforms. MzScheme is slow because it's an interpreter. Whether
you want to call it a straight-forward interpreter or an optimizing
interpreter, it's still an interpreter. All interpreters suffer the
same artifact: implementor's code is fast and user's code is slow.

Let me spell all the benchmarks I ran using MzScheme and not comparing
it to any other implementation.

1. Reading&Parsing nboyer.sch 100 times:
cpu time: 2600 real time: 2748 gc time: 110

2. Reading nboyer.sch 100 times using read-char:
cpu time: 3560 real time: 3900 gc time: 10
(you said this was due to IO, non-blocking, and Unicode)

3. Reading nboyer.sch 100 times using read-byte:
cpu time: 3330 real time: 3531 gc time: 10
(this eliminates the Unicode overhead, but with the same results)

4. Reading nboyer.sch 100 times from a string-port using read-char:
cpu time: 3710 real time: 3920 gc time: 130
(this eliminates all the IO/nonblocking overhead, but same results)

5. Reading nboyer.sch 100 times from a string-port using read-byte:
cpu time: 3550 real time: 3694 gc time: 130
(this eliminates both Unicode and IO/nonblocking overhead, same results)

6. Iterating through a string containing nboyer.sch 100 times:
cpu time: 3620 real time: 3830 gc time: 0
(this bypasses the entire IO layer, but same results!)

So, MzScheme is slow because it doesn't like executing my code; it would
rather execute code that's built-in. The "I get enough higher-level
primitives that I think it has been years since I used `read-char'" does
not fly because you cannot dream that one implementation will provide
all the primitives that all its users will ever need. Users will always
need to write code to express whatever problem they're trying to solve.
It is how often and soon users of these ``high-level'' interpreters
face reality and get forced down to the C level.

Aziz,,,


PS. Here is my code if you want to rerun under your machine.

(define (test-read)
(let ([p (open-input-file "nboyer.sch")])
(let f ((p p))
(let ([x (read p)])
(cond
[(eof-object? x) (close-input-port p)]
[else (f p)])))))

(define (test-read-char)
(let ([p (open-input-file "nboyer.sch")])
(let f ((p p))
(let ([x (read-char p)])
(cond
[(eof-object? x) (close-input-port p)]
[else (f p)])))))

(define (test-read-byte)
(let ([p (open-input-file "nboyer.sch")])
(let f ((p p))
(let ([x (read-byte p)])
(cond
[(eof-object? x) (close-input-port p)]
[else (f p)])))))

(define (file->string fname)
(let ([p (open-input-file "nboyer.sch")]
[o (open-output-string)])
(let f ((p p))
(let ([x (read-char p)])
(cond
[(eof-object? x)
(close-input-port p)
(get-output-string o)]
[else
(write-char x o)
(f p)])))))

(define nboyer-string (file->string "nboyer.sch"))

(define (test-read-byte)
(let ([p (open-input-file "nboyer.sch")])
(let f ((p p))
(let ([x (read-byte p)])
(cond
[(eof-object? x) (close-input-port p)]
[else (f p)])))))

(define (test-read-char-from-string)
(let ([p (open-input-string nboyer-string)])
(let f ((p p))
(let ([x (read-char p)])
(cond
[(eof-object? x) (close-input-port p)]
[else (f p)])))))

(define (test-read-byte-from-string)
(let ([p (open-input-string nboyer-string)])
(let f ((p p))
(let ([x (read-byte p)])
(cond
[(eof-object? x) (close-input-port p)]
[else (f p)])))))

(define (test-string-ref)
(let f ((i 0))
(cond
[(= i (string-length nboyer-string)) 'nothing]
[else
(string-ref nboyer-string i)
(f (add1 i))])))

(define (many f n)
(unless (zero? n)
(f)
(many f (- n 1))))

(time (many test-read 100))
(time (many test-read-char 100))
(time (many test-read-byte 100))
(time (many test-read-char-from-string 100))
(time (many test-read-byte-from-string 100))
(time (many test-string-ref 100))
Abdulaziz Ghuloum

2006-04-08, 10:03 pm

Eli Barzilay wrote:
> I'm really not convinced about that, certainly not by the posted
> numbers. A bad Scheme programmer is still a bad programmer and will
> generate a crappy parser, and a good C programmer is still a good C
> programmer and will generate a good parser. For a fair comparison,
> I'd sit down and go over a Scheme parser, and then I'd translate the
> code to C -- that can get to a proper answer for why a Scheme faster
> should be faster than a C parser, or to comparable C code that is
> natural enough to have been used in the first place.


Related to the previous post. I'll take it that you are a good Scheme
programmer and a good programmer in general. Can you write a good (as
in fast) parser in Scheme using your implementation of choice? And how
would you go about doing that? This question is to go back on-topic.

Aziz,,,
Eli Barzilay

2006-04-08, 10:03 pm

Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> Eli Barzilay wrote:
>
> I guess you're not getting the clues, so, let me explain.
>
> MzScheme is not slow because of Unicode. MzScheme is not slow
> because of bad IO buffers management. MzScheme is not slow because
> it targets many platforms. MzScheme is slow because it's an
> interpreter. Whether you want to call it a straight-forward
> interpreter or an optimizing interpreter, it's still an interpreter.


It has always performed compilation to byte code, is it still an
interpreter? Recently Matthew added a lightning-based JIT that
converts the byte code to native code, is it still an interpreter? If
you're answer is still yes, then I wonder what's your definition of
"interpreter".


> Let me spell all the benchmarks I ran using MzScheme and not
> comparing it to any other implementation.
> [...]


Just for your amusement, I ran your code in four forms. Single run
each since I'm getting tired of this, these are the cpu times:

(1) (2) (3) (4)
618 637 620 632
746 728 551 478
694 676 519 446
878 814 644 561
755 756 609 519
815 798 358 100

(1) your original code, as is (all of your tests unmodified)
(2) your code, wrapped in a (module ...)
(3) same as (1), running in the svn version with the lightning jit
(4) same as (2), running in the svn version with the lightning jit

What does that mean in regards to your argument? I really don't know
since I don't know what your argument is. My only argument so far did
not change -- timing different implementations against each other is
mostly useless.


> 4. Reading nboyer.sch 100 times from a string-port using read-char:
> cpu time: 3710 real time: 3920 gc time: 130
> (this eliminates all the IO/nonblocking overhead, but same results)


(It does not -- reading from a string from a user point of view should
be like reading from a file, so string ports are much more than a
simple port+index.)


> So, MzScheme is slow because it doesn't like executing my code; it
> would rather execute code that's built-in. The "I get enough
> higher-level primitives that I think it has been years since I used
> `read-char'" does not fly because you cannot dream that one
> implementation will provide all the primitives that all its users
> will ever need. Users will always need to write code to express
> whatever problem they're trying to solve. It is how often and
> soon users of these ``high-level'' interpreters face reality and get
> forced down to the C level.


I guess you're not getting the clues, so, let me explain. When I said
that I didn't use read-char in years, I wasn't trying to compare
MzScheme's high-levelness to other implementation's
whatever-levelness, I wasn't trying to compare compiled code,
bytecompiled code, and interpreted code, I wasn't trying to compare C
vs Scheme, or even raw C vs C + libc. I only said getting a single
character is unuseful enough that I hadn't used it in years. I would,
obviously, use it if my implementation had only that as a way to read
files. (IIRC, the Linux kernel interface doesn't have a get-char
operation.)

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Eli Barzilay

2006-04-08, 10:03 pm

Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> Eli Barzilay wrote:
>
> Related to the previous post. I'll take it that you are a good
> Scheme programmer and a good programmer in general. Can you write a
> good (as in fast) parser in Scheme using your implementation of
> choice?


I can only assume so, but won't have time for it.


> And how would you go about doing that?


I'd probably start from the reading list that Jens posted, since I
appreciate years of work people have spent in getting fast
implementations.


> This question is to go back on-topic.


Whew, thanks.

(And if I ever would do that, then to get right back to the off-topic,
I'd really want to investigate if there's anything that makes Scheme a
faster language than C for such code. I really, really wouldn't mind
having some concrete points for Scheme in the form of faster code.)

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Jens Axel Søgaard

2006-04-09, 4:03 am

Adrian wrote:
> Jens Axel Søgaard wrote:
>
>
> Thanks for the links! I've been reading through some of these today
> with a focus on scanning and parsing rather than code generation. The
> above paper specifies a scanner using a deterministic finite automata
> and a recursive descent parser. Are these considered the fastest way
> to scan and parse scheme code? Or are they used in the paper simply
> because they are either the simplest, or most elegant and
> instructional?


It's standard to use finite automatas as scanners. Some write them
by hand, and others generate them using "lex" or similar.

Recursive descent is a standard method. It is simple and, as Aziz
and Ray writes, easy to extend. Whether or not it is
(theoretically speaking) the fastest method, I don't know. But is
it important? The other phases of the compiler will most likely
me considerably slower. So slow that making the parser 10% faster
hardly will be noticeable by the end user.

Perhaps the other participants in the thread have hard numbers
for their compilers?

--
Jens Axel Søgaard

Abdulaziz Ghuloum

2006-04-09, 7:06 pm

Eli Barzilay wrote:

> I'd really want to investigate if there's anything that makes Scheme a
> faster language than C for such code.


Please do share your findings with us.

Aziz,,,
Ray Dillinger

2006-04-09, 7:06 pm

Lauri Alanko wrote:

> Parser generators often produce bottom-up parsers, which are pretty
> difficult to modify on the fly. So if run-time reader extensions are
> desired, then a custom recursive-descent parser makes more sense,
> especially since the s-expression grammar is (easiry transformed into)
> LL(1).
>
> Of course an extensible LL(1) parser could be produced by a generator,
> too, but I don't think such tools are very common.


You don't really need a tool like that; you can just have a hook that
calls your user-defined code whenever the parser spits out something
as a lexical error and use any kind of base parser you want.

Bear



Abdulaziz Ghuloum

2006-04-09, 7:06 pm

Jens Axel Søgaard wrote:

> Adrian wrote:
>
>
>
> It's standard to use finite automatas as scanners. Some write them
> by hand, and others generate them using "lex" or similar.
>
> Recursive descent is a standard method. It is simple and, as Aziz
> and Ray writes, easy to extend. Whether or not it is
> (theoretically speaking) the fastest method, I don't know. But is
> it important? The other phases of the compiler will most likely
> me considerably slower. So slow that making the parser 10% faster
> hardly will be noticeable by the end user.
>
> Perhaps the other participants in the thread have hard numbers
> for their compilers?



As I probably said elsewhere in the tread, when I implemented the
reader for my compiler, I did not have efficiency in mind. I just
wanted to write the most straight-forward code that takes the least
time to write so that I can get to the other parts of the compiler. I
wrote it as a "throw-away" code thinking that it will eventually be
rewritten once I put the rest of the compiler in good shape. So, it
was written as a quick and dirty solution to get the compiler to read
its source. The parser and the IO library (input/output file/string
ports) were written over a wend perhaps. Needless to say that it
ran very slow at the time. It probably took a few seconds to read its
own source, but it served the purpose.

Fast-forward one year, fix the other parts of the compiler
(generational garbage collector, source-level optimizer, better
register-allocator, ...), and the parser/IO library automagically
became so much faster that there is still no point in rewriting them.
I prefer to optimize the rest of the compiler and let the parser ride
on the benefits rather than optimize the parser (by writing it in C or
whatever) to hide the inadequacies of the compiler. Also, reflecting
on this thread, I see no evidence that there is a faster way to write
a parser than to write it in Scheme and let your compiler do its
magic.

So, to Adrian, I recommend that you just write your parser in the most
straight-forward manner that's correct and pick a decent
implementation that delivers decent performance. I don't know which
implementation to recommend since my experience with good compilers is
limited to Chez Scheme (not free, unless you have access to it from
your school, assuming you're in school) and my compiler (not available
at this point in time). Writing a straight-forward parser would also
allow you to easily move from one implementation to another if you
want to evaluate their relative performance.

Aziz,,,
William D Clinger

2006-04-09, 7:06 pm

Jens Axel S=F8gaard wrote:
> Recursive descent is a standard method. It is simple and, as Aziz
> and Ray writes, easy to extend. Whether or not it is
> (theoretically speaking) the fastest method, I don't know.


Recursive descent parsing is O(n), which is the fastest
possible. There is a moderately large (and now obsolete)
number of papers that claim recursive descent (and other
forms of LL(k) parsing) is faster than LR parsing, and an
equally large (and equally obsolete) literature claiming
the opposite. It is now generally accepted that all of
the standard O(n) methods, including recursive descent,
are equally fast in theory, and that any differences
that might be observed in practice are attributable to
interactions between programming style, the compilers
used to compile the parser, and the hardware used to run
the parser.

For those who haven't understood why so much of this thread
has been about performance of compilers and interpreters:
that is why. The parsing algorithm just doesn't matter
very much, so long as it's O(n).

> But is
> it important? The other phases of the compiler will most likely
> me considerably slower. So slow that making the parser 10% faster
> hardly will be noticeable by the end user.


That is the main reason I cited hard numbers. In most
systems, the speed of parsing is limited by the speed of
character i/o.

MzScheme and Larceny are two exceptions. The fact that
MzScheme's parser is *faster* than character i/o is both
amusing and instructive. The fact that Larceny's parser
is so slow, yet still outruns a parser written in C/C++,
is also instructive.

> Perhaps the other participants in the thread have hard numbers
> for their compilers?


We can use load to approximate the compile time, because
"nboyer.sch" performs virtually no computation as it is
loaded. Average of 3 runs on a slower Solaris machine
than the one I used last time:

Chez Larceny MzScheme
Scheme fast-safe (default)
opt=3D2

load 4.54 89.12 11.42
read .68 3.35 3.67
read-char (file) .37 .99 4.56
read-char (string) .22 1.08 4.68
string-ref .78 .12 4.73

read/load 15 % 4 % 32 %


Given Larceny's very slow compiler, its slow parser is fast
enough. Larceny's parser would not be fast enough for Chez
Scheme, however, because Chez Scheme's compiler is extremely
fast.

In the opinion of those who implemented MzScheme, a parser
written in interpreted MzScheme would not be fast enough.
Even though MzScheme's parser is written in C/C++, it still
accounts for a third of the compile time.

Will

****************************************
************************

(define (test-load)
(load "nboyer.sch"))

(define (test-read)
(let ((p (open-input-file "nboyer.sch")))
(let f ((p p))
(let ((x (read p)))
(cond
((eof-object? x) (close-input-port p))
(else (f p)))))))

(define (test-read-char)
(let ((p (open-input-file "nboyer.sch")))
(let f ((p p))
(let ((x (read-char p)))
(cond
((eof-object? x) (close-input-port p))
(else (f p)))))))

(define (file->string fname)
(let ((p (open-input-file "nboyer.sch"))
(o (open-output-string)))
(let f ((p p))
(let ((x (read-char p)))
(cond
((eof-object? x)
(close-input-port p)
(get-output-string o))
(else
(write-char x o)
(f p)))))))

(define nboyer-string (file->string "nboyer.sch"))

(define (test-read-char-from-string)
(let ((p (open-input-string nboyer-string)))
(let f ((p p))
(let ((x (read-char p)))
(cond
((eof-object? x) (close-input-port p))
(else (f p)))))))

(define (test-read-byte-from-string)
(let ((p (open-input-string nboyer-string)))
(let f ((p p))
(let ((x (read-byte p)))
(cond
((eof-object? x) (close-input-port p))
(else (f p)))))))

(define (test-string-ref)
(let f ((i 0))
(cond
((=3D i (string-length nboyer-string)) 'nothing)
(else
(string-ref nboyer-string i)
(f (+ 1 i))))))

(define (many f n)
(if (not (zero? n))
(begin (f)
(many f (- n 1)))))

(define (run-benchmarks n)
(define (announce fname)
(display "Benchmarking ")
(write fname)
(newline))
(many (lambda ()
(announce "load")
(time (many test-load 100))
(announce "read")
(time (many test-read 100))
(announce "read-char (file)")
(time (many test-read-char 100))
; (time (many test-read-byte 100))
(announce "read-char (string)")
(time (many test-read-char-from-string 100))
; (time (many test-read-byte-from-string 100))
(announce "string-ref")
(time (many test-string-ref 100)))
n))

Eli Barzilay

2006-04-09, 7:06 pm

Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> Eli Barzilay wrote:
>
>
> Please do share your findings with us.


The "And if I ever would do that" was an important qualifier for the
above sentence.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
William D Clinger

2006-04-09, 7:06 pm

Abdulaziz Ghuloum wrote:
> How much does Unicode support cost read-char? I honestly would like to
> know.


In Larceny, on a SPARC, when reading a UTF-8 file consisting
entirely of ASCII characters, the cost for UTF-8 support is two
machine instructions per character (a compare and conditional
branch).

In ten measurements of the cost of reading nboyer.sch 100
times, Larceny's ASCII-only version of read-char ran your
benchmark in an average of 737 milliseconds, while a UTF-8
version ran in an average of 733 milliseconds. The sample
deviations were 166 and 74 milliseconds respectively, so
the cost of Unicode support is way too small to measure
with this kind of benchmark.

(To those who don't know: On many microprocessors, such
things as the detailed alignment of basic blocks and branch
targets on various multiword boundaries can make a big
difference, so adding a couple of instructions is almost as
likely to make this kind of benchmark run faster as to make
it run slower. Worrying about this level of micro-performance
is insane.)

Thank you, BTW, for your message in which you dismissed
the cost of Unicode and other obfuscatory red herrings that
have been raised in this thread. Your advice to the original
poster was exactly right.

Will

Rob Thorpe

2006-04-10, 7:04 pm

Adrian wrote:
> Jens Axel S=F8gaard wrote:
>
>
>
> Thanks for the links! I've been reading through some of these today
> with a focus on scanning and parsing rather than code generation. The
> above paper specifies a scanner using a deterministic finite automata
> and a recursive descent parser. Are these considered the fastest way
> to scan and parse scheme code? Or are they used in the paper simply
> because they are either the simplest, or most elegant and
> instructional?


Recently I've been writing a C program this program needed to read and
write quite complex data. I decided to do this by embedding a scheme
reader and printer into it. I haven't implemented all of scheme read
syntax, the only big things I've missed though are the obscure number
formats and quasiquotation. I did mine using a non-deterministic
finite automata lexer and a small recursive-descent parser. Both
manually written in C. I haven't measured how fast it is, but it seems
OK.

A few notes, on this, some from doing this some from other parsers:-
* As others have pointed out IO before lexing is normally the
bottleneck.

* An IO bottleneck at the OS or library level can be avoided by reading
a large block of data at once into the program, then going through it
character by character.

* As others also pointed out parsing scheme is easy, lexing scheme is
harder though because of all the different number formats. My parser
has only 5 functions.

* I did not use flex, I wrote a manual NFA, which is complicated.
Avoid it if you can, I couldn't.

* Other character sets like Unicode can cause big problems unless
handled correctly. For example, if you use a DFA to do the lexing
then don't make it use 16-bit Unicode. If it does then it's tables
will be much bigger. Instead you can normally do the lexing on byte
characters by doing some conversion. There are other possibilities.

* Often parser speed does not matter because other things are slower,
such as networks files are stored on and code that uses the data

* When parsing programs there are lots of small files involved, so
quite a bit of time is spent opening and closing files rather than
reading them. The dreaded virus checker can make this slow on
windows.

Rob Thorpe

2006-04-10, 7:04 pm

William D Clinger wrote:
> Jens Axel S=F8gaard wrote:
>
> Recursive descent parsing is O(n), which is the fastest
> possible. There is a moderately large (and now obsolete)
> number of papers that claim recursive descent (and other
> forms of LL(k) parsing) is faster than LR parsing, and an
> equally large (and equally obsolete) literature claiming
> the opposite. It is now generally accepted that all of
> the standard O(n) methods, including recursive descent,
> are equally fast in theory, and that any differences
> that might be observed in practice are attributable to
> interactions between programming style, the compilers
> used to compile the parser, and the hardware used to run
> the parser.


The problem used to be that LALR parsing is table driven. Recursive
descent uses stack frames to store the same data, and generally lots of
stack space. On many old architectures there was a hard limit on the
size of the stack (eg 8086).

So, if you wanted to write an LL parser you had to make in table-driven
anyway, or you would run out of memory. But theres little point
writing an LL parser that way since using an LALR generator is easier.

These days, with no limits on stacks the problem is irrelevant.

> For those who haven't understood why so much of this thread
> has been about performance of compilers and interpreters:
> that is why. The parsing algorithm just doesn't matter
> very much, so long as it's O(n).


To an extent yes, hopefully the constant factor shouldn't be that bad
though.

>
> That is the main reason I cited hard numbers. In most
> systems, the speed of parsing is limited by the speed of
> character i/o.


Part of the reason people used to look at parser speed is because IO
used to be relatively faster than it is now. In recent times memory
and CPU speeds have gone up faster than IO latency.

Abdulaziz Ghuloum

2006-04-10, 7:04 pm

Rob Thorpe wrote:

> William D Clinger wrote:


>
> To an extent yes, hopefully the constant factor shouldn't be
> that bad though.


Yes. There is the constant that's part of the different parsing
algorithms. There is a more important constant that's part of the
implementation you use to implement the parser. If your implementation
hits you with a 100-fold slowdown (not MzScheme), then you will produce
a crappy parser(TM), no matter how brilliant a programmer you are or how
clever your parsing algorithm is.

Aziz,,,
William D Clinger

2006-04-10, 7:04 pm

Eli Barzilay wrote:
>
> (It does not -- reading from a string from a user point of view should
> be like reading from a file, so string ports are much more than a
> simple port+index.)


At Eli's request, I used my own LexGen and ParseGen
tools to generate a portable lexical analyzer and
parser for the subset of Scheme used in nboyer.sch,
and implemented character-level input from a string
representation of nboyer.sch. This gets rid of all
the variations between implementations of read-char
that were troubling Eli.

Here's the average of three runs, in seconds, parsing
nboyer.sch 1000 (I repeat, one thousand) times from a
string on a particular Solaris machine:

parse time sample relative
deviation performance

Chez Scheme 11.3 .06 100 %
v6.1 opt=2
Larceny 19.0 .02 59 %
v0.91a
MzScheme 277.0 1.3 4 %
v301
MzScheme 154.8 .4 7 %
v301.12

MzScheme v301.12 uses a just-in-time (JIT) compiler,
which provides a worthwhile improvement over interpreted
MzScheme: a factor of almost 2 on this benchmark.

The real value of this benchmark will come when I use
it to benchmark the C and Java code that LexGen and
ParseGen generate. I'll have to write some support
code first, though, and I didn't have time for that
today.

Will

Eli Barzilay

2006-04-10, 7:04 pm

More information, thanks to Matthew who clarified a few things, and
thanks to Will who produced a portable IO-independent parser to try.

First of all, going back to the original times that Will posted,
Matthew said that the IO features that makes `read-char' slower are
support for multiple port types, arbitrary p-ahead, non-character
values, progress events, port position counting, etc. These features
pile up branches in the code, with no common-case shortcuts. (Another
point that Matthew raised is that most of "nboyer.sch" is whitespaces
and comments, so the `read' time is dominated by `read-char' -- if you
look at the `read'-`read-char' deltas, you get the time that is
actually spent on parsing -- Chez's parser being really fast, and
Larceny's being around the neighborhood of Scheme48. To do this for
MzScheme too, the time it takes for a C char-by-char reading should be
subtracted, but I'm over my quota of spending time on this...)

As for the code that Will produced and timed: first a clarification --
Will ran the code on the new MzScheme which has a JIT, but not for
Sparc. The speedup that Will's numbers show is due to some recent
optimization like loop unrolling and inlining that happen at the
bytecode level. I ran Will's code in MzScheme v301, in v301.12 with
the JIT disabled, and in v301.12 with the JIT enabled; I repeated the
whole thing when the code is put in a module (which allows more
optimizations). Each test is the same as Will's (1000 reads of
"nboyer.sch"), on a two year old Linux with a single 2.8GHz CPU.
Repeated 5 times, and averaged the middle three results. (BTW, I
needed to slightly fix the code -- there was one undefined `nextToken'
identifier; also the behavior was different than MzScheme's in that it
didn't parse `1-' as a single symbol, so I changed "nboyer.sch" to
have `sub1' instead.)

The results are:

toplevel module
301 144.4(*) 141.1
301.12(JIT disabled) 75.3(*) 71.5
301.12(+ JIT) 29.6 21.3

The two (*) numbers correspond to Will's numbers, and the ratio is
roughly the same as in Will's table. So, extending Will's table will
give:

> parse time sample relative
> deviation performance
>
> Chez Scheme 11.3 .06 100 %
> v6.1 opt=2
> Larceny 19.0 .02 59 %
> v0.91a
> MzScheme 277.0 1.3 4 %
> v301
> MzScheme 154.8 .4 7 %
> v301.12

| MzScheme 43.8 26 %
| v301.12
| +JIT, in a module

and the speedup factor of everything that went into the new version of
MzScheme is more than 6.

(Disclaimer: this text is posted as is, without warranty of any kind.
Conclusions, if you see any, are not intended and purely accidental.)

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Abdulaziz Ghuloum

2006-04-10, 10:03 pm

Eli Barzilay wrote:

> First of all, going back to the original times that Will posted,
> Matthew said that the IO features that makes `read-char' slower are
> support for multiple port types, arbitrary p-ahead, non-character
> values, progress events, port position counting, etc. These features
> pile up branches in the code, with no common-case shortcuts.


For those unfamiliar with Chez Scheme, it's worth noting that it
provides all of the above-mentioned features, except arbitrary
p-ahead probably, and at the same time remains the fastest of
the bunch.

More info about Chez's IO is at: http://www.scheme.com/csug7/io.html


> (Another
> point that Matthew raised is that most of "nboyer.sch" is whitespaces
> and comments, so the `read' time is dominated by `read-char' -- if you
> look at the `read'-`read-char' deltas, you get the time that is
> actually spent on parsing


Wouldn't that be negative, since read took less time than read-char?
(I'm looking at the original figures that Will cited in
Message-ID: <1144427079.332746.320160@z34g2000cwc.googlegroups.com> )

Aziz,,,
Eli Barzilay

2006-04-10, 10:03 pm

Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> Eli Barzilay wrote:
>
>
> For those unfamiliar with Chez Scheme, it's worth noting that it
> provides all of the above-mentioned features, except arbitrary
> p-ahead probably, [...] http://www.scheme.com/csug7/io.html


(I don't see any mention of port types like network ports, pipe ports,
etc, non-character values are arbitrary Scheme values that can be
embedded in ports in MzScheme (eg picture values in DrScheme), nothing
on events (MzScheme has most of CML, with synchronizable event
objects), and port-position refers to counting lines and columns which
I don't see either.)


>
> Wouldn't that be negative, since read took less time than read-char?
> (I'm looking at the original figures that Will cited in
> Message-ID: <1144427079.332746.320160@z34g2000cwc.googlegroups.com> )


(Yes it would, just like I said in the part that you did not quote:
> To do this for MzScheme too, the time it takes for a C char-by-char
> reading should be subtracted, [...]

)

(But that was in parenthesis because I didn't want to say more on
that, and this post is all in parenthesis since I really don't want
(and did not try) to get into a pissing contest.)

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Abdulaziz Ghuloum

2006-04-11, 4:06 am

Eli Barzilay wrote:

> (But that was in parenthesis because I didn't want to say more on
> that, and this post is all in parenthesis since I really don't want
> (and did not try) to get into a pissing contest.)


And you're putting this in parenthesis because you don't want to say
more on it. Got it. Sorry, I wasn't aware of this convention.

(Sorry if you felt you were being dragged into a pissing contest.
Remember that participating in contests is a voluntary action.)

Aziz,,,
Rob Thorpe

2006-04-11, 4:06 am

Rob Thorpe wrote:
> Adrian wrote:
an.[color=darkred]
>
> Recently I've been writing a C program this program needed to read and
> write quite complex data. I decided to do this by embedding a scheme
> reader and printer into it. I haven't implemented all of scheme read
> syntax, the only big things I've missed though are the obscure number
> formats and quasiquotation. I did mine using a non-deterministic
> finite automata lexer and a small recursive-descent parser. Both
> manually written in C. I haven't measured how fast it is, but it seems
> OK.


BTW
I can post this implementation of read if anyone is interested,
copyright is not an issue.

William D Clinger

2006-04-11, 7:04 pm

Eli Barzilay wrote:
> First of all, going back to the original times that Will posted,
> Matthew said that the IO features that makes `read-char' slower are
> support for multiple port types, arbitrary p-ahead, non-character
> values, progress events, port position counting, etc. These features
> pile up branches in the code, with no common-case shortcuts.


That's unfortunate. Chez Scheme and Larceny support
extensible port types and various other features without
slowing the common case for read-char. Most systems
programmers try to make users pay for features only if
they use them, while optimizing the common cases. I
would recommend these principles to the original poster.

> (Another
> point that Matthew raised is that most of "nboyer.sch" is whitespaces
> and comments, so the `read' time is dominated by `read-char'


Whitespace and comments are indeed predominant in
well-written code, which is why compiler textbooks stress
the importance of scanning them quickly. Obviously, you
can't write a fast lexical analyzer without fast i/o.

And now for more plain silliness...

> (BTW, I
> needed to slightly fix the code -- there was one undefined `nextToken'
> identifier; also the behavior was different than MzScheme's in that it
> didn't parse `1-' as a single symbol, so I changed "nboyer.sch" to
> have `sub1' instead.)


nboyer.sch does not contain any occurrences of 1- as a
symbol. It did at one time, but that bug was corrected on
4 April 2001.

The nextToken procedure is never called by this parsing
benchmark, but I will remove the call to it in case some
compiler complains about references to undefined variables.

> | MzScheme 43.8 26 %
> | v301.12
> | +JIT, in a module


I am pleased to see that MzScheme can parse from a string
about as fast as it can read characters using read-char.
Or does the JIT compiler speed up the read-char benchmark
as well?

You asked me to write this benchmark to remove dependence
upon the i/o subsystem, and also to ensure that all systems
are running exactly the same Scheme source code. Doesn't
your use of MzScheme's module construct sort of defeat the
purpose, from your alleged point of view?

Will

William D Clinger

2006-04-11, 7:04 pm

Rob Thorpe wrote:
> * I did not use flex, I wrote a manual NFA, which is complicated.
> Avoid it if you can, I couldn't.


Every NFA is equivalent to some DFA, and DFAs are not
only easy to implement in Scheme, they are usually a
good deal more efficient when implemented in Scheme
than when implemented in C/C++/Java/whatever, which
must simulate the tail recursion of the more natural
implementation.

Before this thread is done, I will post links to my
LexGen and ParseGen tools, which generate efficient
lexers and parsers written in C, Java, or Scheme.

> * Other character sets like Unicode can cause big problems unless
> handled correctly. For example, if you use a DFA to do the lexing
> then don't make it use 16-bit Unicode. If it does then it's tables
> will be much bigger. Instead you can normally do the lexing on byte
> characters by doing some conversion. There are other possibilities.


The DFA will run faster if implemented procedurally
than if implemented as a table interpreter. When the
DFA is implemented procedurally, using Unicode (whether
16 or 21 bits) will make no measurable difference.

> Part of the reason people used to look at parser speed is because IO
> used to be relatively faster than it is now. In recent times memory
> and CPU speeds have gone up faster than IO latency.


How do you reconcile that with Eli's claim that the
read-char procedure, as implemented in MzScheme, is
slow because of CPU branch instructions and the like?

Will

Eli Barzilay

2006-04-11, 7:04 pm

"William D Clinger" <cesura17@yahoo.com> writes:

> nboyer.sch does not contain any occurrences of 1- as a symbol. It
> did at one time, but that bug was corrected on 4 April 2001.


(I must have found an old copy then. I found the problem because your
benchmark was also verifying the correct result.)


> The nextToken procedure is never called by this parsing benchmark,
> but I will remove the call to it in case some compiler complains
> about references to undefined variables.


(Using modules in MzScheme does complain about any undefined
identifier.)


>
> I am pleased to see that MzScheme can parse from a string about as
> fast as it can read characters using read-char. Or does the JIT
> compiler speed up the read-char benchmark as well?


Yes, but not much. The same test configurations have these numbers:

in module
v301 10.6 9.8
v301.12 -JIT 10.7 8.6
v301.12 +JIT 6.5 5.5


> You asked me to write this benchmark to remove dependence upon the
> i/o subsystem, and also to ensure that all systems are running
> exactly the same Scheme source code. Doesn't your use of MzScheme's
> module construct sort of defeat the purpose, from your alleged point
> of view?


Using a module is just a linguistic feature that allows the language
to perform more optimizations. Looking at the Chez manual, I see
this:

Optimize level 2 is safe, like optimize level 0, but allows the
compiler to assume that primitives like car always have their
original value. [...] If you wish to get a quick boost without
wrapping the program in a top-level let or module form and inserting
(import scheme) where appropriate, however, setting the optimization
level to 2 might still be useful.

Putting code in a MzScheme module serves a similar purpose, so it is
equivalent to your use of `opt=2'.

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Rob Thorpe

2006-04-11, 7:04 pm

William D Clinger wrote:
> Rob Thorpe wrote:
>
> Every NFA is equivalent to some DFA, and DFAs are not
> only easy to implement in Scheme, they are usually a
> good deal more efficient when implemented in Scheme
> than when implemented in C/C++/Java/whatever, which
> must simulate the tail recursion of the more natural
> implementation.


Yes. The program is question needs to work on Windows. When I started
writing it the only lexer generation tool that outputted C and I
trusted was flex. When I last tried I couldn't get flex to work
properly on Windows.

I know that DFAs are generally more efficient, but try building one by
hand!

> Before this thread is done, I will post links to my
> LexGen and ParseGen tools, which generate efficient
> lexers and parsers written in C, Java, or Scheme.


Thanks, I've searched the web and downloaded the code I'll give LexGen
a try.

>
> The DFA will run faster if implemented procedurally
> than if implemented as a table interpreter. When the
> DFA is implemented procedurally, using Unicode (whether
> 16 or 21 bits) will make no measurable difference.


Really?
As I see it there are two ways to begin a lexing routine, either
1) Take a character and do a table lookup. From that lookup either
execute code or lookup data
2) Go through a tree of conditional statements on a character.

As I see it #2 can only be faster if the table in #1 is very big, e.g.
in the case of 16bit or 21bit Unicode.

Is there something that makes this untrue?

>
> How do you reconcile that with Eli's claim that the
> read-char procedure, as implemented in MzScheme, is
> slow because of CPU branch instructions and the like?


I was talking about IO latency. The actual time hard-disks take to
move from one area on their surface to another is relatively very slow.
Once buffering is taken into account reading from any individual file
is very fast. So reading one long file can be very fast. Reading many
small files, which is common in program builds, is relatively slower.

William D Clinger

2006-04-11, 7:04 pm

Rob Thorpe wrote:
> As I see it there are two ways to begin a lexing routine, either
> 1) Take a character and do a table lookup. From that lookup either
> execute code or lookup data
> 2) Go through a tree of conditional statements on a character.
>
> As I see it #2 can only be faster if the table in #1 is very big, e.g.
> in the case of 16bit or 21bit Unicode.
>
> Is there something that makes this untrue?


Yes. If you go through a jump table, the jump will be
indirect. With most current microprocessors, indirect
jumps are slower than direct jumps. (And if you fetch
data from a table, you will need a second case dispatch
to interpret the data.)

The problem is essentially the same that compilers
face when compiling a case expression in Scheme or a
switch statement in C/C++/Java. If the number of cases
is very small, sequential search will probably be the
fastest way to implement the case expression. If the
number of cases is larger, a binary search will probably
be faster than a sequential search. If the number of
cases is very large, and dense within some range, then
a jump table will probably be faster than binary search.

The compiler probably knows more than you do about the
fastest way to implement a case expression on the target
architecture. That's why you should use case expressions,
and let the compiler figure out the fastest way. In all
probability, the fastest technique will vary from one
state of your DFA to another.

Regardless of the technique, whether your case expressions
dispatch on ASCII or Unicode characters makes hardly any
difference. You can see this by looking at the generated
code.

Will

Abdulaziz Ghuloum

2006-04-11, 7:04 pm

Eli Barzilay wrote:

> Using a module is just a linguistic feature that allows the language
> to perform more optimizations. Looking at the Chez manual, I see
> this:
>
> Optimize level 2 is safe, like optimize level 0, but allows the
> compiler to assume that primitives like car always have their
> original value. [...] If you wish to get a quick boost without
> wrapping the program in a top-level let or module form and inserting
> (import scheme) where appropriate, however, setting the optimization
> level to 2 might still be useful.
>
> Putting code in a MzScheme module serves a similar purpose, so it is
> equivalent to your use of `opt=2'.


Please Eli, wrapping things in a module does more than that. It allows
the compiler to perform interprocedural optimizations (like copy-
propagation, constant-folding, inlining, direct-call optimization,
specializing calling-conventions, redundant-checks elimination, and
much much more). I would not be surprized if Chez gained another order
of magnitude in speed by wrapping the read-char benchmark in a module
(or even a let) even at optimize-level 2. So, optimize-level 2 and
wrapping things in a module are definately NOT the same thing. Does
MzScheme have the equivalent of the ``--prim'' in mzc or optimize-level
2 in Chez? If there is one that delivers better performance without
*requiring* using the module system[*], well that would be useful.

Aziz,,,

[*] I don't have problem with module systems, it's just that I don't
like to have to remember the exact syntax of every implementation's
idea of what modules look like. The two implementations I use regularly
have the exact same idea of what modules are, so it's not hard for me to
context-switch between the two. Not to mention that I believe
compilers should work for the users and not impact their coding style
too much. Most Scheme users I know (your circle may be different) would
never touch a module system; they prefer code that's clean and simple
and left-flushed to the file. They do, however, add (optimize-level 2)
to the top of their files to get some 4x or 8x performance boost.
Abdulaziz Ghuloum

2006-04-11, 7:04 pm

Rob Thorpe wrote:

> I know that DFAs are generally more efficient, but try building one by
> hand!


You just keep where you are in the automata as part of the name of the
procedure representing that state. It's actually pretty
straight-forward. Here are the names of states I have in my parser:
(note: very simple integers only)

Semi ; for comments
Str ; for strings
Sym ; for symbols
SymB ; for |symbols|
Sign ; for +, -
Signed ; for a number after a sign.
Num ; for numbers with no sign
Dot ; for ... and the like
Unq ; for `, `@
Hash ; for things starting with #
Hash2 ; #2%prims
Hash3 ; #3%prims
HashBar ; #| commenets |#
HashBarBar ; #| comments | skipping over | bars |#
HashBang ; #!
HashBangE ; #!e
HashBandEO ; #!eo going for #!eof
HashS ; #\c
HashB ; #b binary numbers
HashBS ; #b+ and #b-
HashBSI ; #b10101011
HashX ; same for hexadecimal numbers
HashXS
HashXSI

Aziz,,,
Eli Barzilay

2006-04-11, 7:04 pm

Abdulaziz Ghuloum <aghuloum@c-s-remove-dashes.indiana.edu> writes:

> Eli Barzilay wrote:
>
>
> Please Eli, wrapping things in a module does more than that. It
> allows the compiler to perform interprocedural optimizations (like
> copy- propagation, constant-folding, inlining, direct-call
> optimization, specializing calling-conventions, redundant-checks
> elimination, and much much more). I would not be surprized if Chez
> gained another order of magnitude in speed by wrapping the read-char
> benchmark in a module (or even a let) even at optimize-level 2.


I read the above paragraph as saying that opt=2 is not needed if
you're using a module. In fact, the paragraph on opt=2 begins with:

Another useful optimization level is 2, but its use is no longer
recommended. [...]

In any case, running the same code in Chez with opt=2 and with a
module (and with both) should be easy, so there's no point arguing on
what it does. Personally, I'm very impressed with what Chez does; if
I see the numbers of running the same code in a module in Chez, and
they indeed are an order of magnitude faster, I will still be
impressed.

Like I said, I never intended to get into a pissing contest. On the
other hand, you were the one who said:

| MzScheme is slow because it's an interpreter. Whether you want to
| call it a straight-forward interpreter or an optimizing interpreter,
| it's still an interpreter. [...]


> So, optimize-level 2 and wrapping things in a module are definately
> NOT the same thing.


If Chez + code-in-module is faster than opt=2, then I think that
modules should be used.


> Does MzScheme have the equivalent of the ``--prim'' in mzc or
> optimize-level 2 in Chez? If there is one that delivers better
> performance without *requiring* using the module system[*], well
> that would be useful.


Yes, there is a MzScheme equivalence for `--prim' -- you put the code
in a module. This gives you the equivalent power of `--prim', in a
much better way -- instead of a command-line argument that says "trust
me, I didn't play any funny games", you have a language feature that
makes it impossible to play such funny games. If you don't want to
put code in a module, then there is no equivalence for --prim, and
personally, I don't even want --prim since it always is a hack. I'd
just use CL instead of --prim, since they just forbid the kind of
changes that violate it.


> [*] I don't have problem with module systems, it's just that I don't
> like to have to remember the exact syntax of every implementation's
> idea of what modules look like. The two implementations I use
> regularly have the exact same idea of what modules are, so it's not
> hard for me to context-switch between the two. Not to mention that
> I believe compilers should work for the users and not impact their
> coding style too much. Most Scheme users I know (your circle may be
> different) would never touch a module system; they prefer code
> that's clean and simple and left-flushed to the file. They do,
> however, add (optimize-level 2) to the top of their files to get
> some 4x or 8x performance boost.


My circles are very different, yes. Indentation is just a cosmetic
issue (with a solution, if you really want one), but I don't see how
the compiler is working for me when I can use a module system to say
"trust me" in a formal way[*]. More than that -- I have many examples
of very useful code that would be completely impossible without a
module system, if all I had was that "clean and simple" toplevel.

[*] That is, this kind of "working for me" is something I don't want
it to do. It's the same kind of working-for-me as other flags like
"assume that x is an integer", or "assume that all arithmetics is
fixnum".

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://www.barzilay.org/ Maze is Life!
Rob Thorpe

2006-04-12, 4:03 am

Abdulaziz Ghuloum wrote:
> Rob Thorpe wrote:
>
>
> You just keep where you are in the automata as part of the name of the
> procedure representing that state. It's actually pretty
> straight-forward. Here are the names of states I have in my parser:
> (note: very simple integers only)
>
> Semi ; for comments
> Str ; for strings
> Sym ; for symbols
> SymB ; for |symbols|
> Sign ; for +, -
> Signed ; for a number after a sign.
> Num ; for numbers with no sign
> Dot ; for ... and the like
> Unq ; for `, `@
> Hash ; for things starting with #
> Hash2 ; #2%prims
> Hash3 ; #3%prims
> HashBar ; #| commenets |#
> HashBarBar ; #| comments | skipping over | bars |#
> HashBang ; #!
> HashBangE ; #!e
> HashBandEO ; #!eo going for #!eof
> HashS ; #\c
> HashB ; #b binary numbers
> HashBS ; #b+ and #b-
> HashBSI ; #b10101011
> HashX ; same for hexadecimal numbers
> HashXS
> HashXSI


Well you can do that and it will work fine for a deterministic finite
automata, but what do you do when the finite automata becomes
non-deterministic?

For example imagine the lexer has read:
456
And is at the 6. The next character read is ".", at this stage the
number may be an integer or floating point, or symbol. After that if
another number is read then we have a floating point number, if a
delimiter is read then its a whole number, if it's a character that is
part of scheme number syntax is read then it's still a number of some
kind. If its a character that is not part of the syntax then it's a
symbol!

The simplest way to deal with these situations is to use an NFA, make
an assumption about what is being read, then if it's false fail to some
starts state and re-read. It isn't the fastest way to do it of-course.

For example given an DFA the state the lexer goes into after reading
the dot would be best described as:
"Main part of number has finished, dot read, might be number or symbol
(or error)"
Not the best name for a function or variable, and not easy to shorten
into anything meaningful. This is one reason why DFAs are generally
constructed by programs.

Rob Thorpe

2006-04-12, 7:04 pm

William D Clinger wrote:
> Rob Thorpe wrote:
>
> Yes. If you go through a jump table, the jump will be
> indirect. With most current microprocessors, indirect
> jumps are slower than direct jumps. (And if you fetch
> data from a table, you will need a second case dispatch
> to interpret the data.)


Sure. It depends on the amount of data to look through. If using an
indirect jump costs 10 cycles, then if you can fit a set of directly
branching tests into that time then they will be faster. For say 255
possibilities I suppose its possible that a direct branching test may
be as fast as an indirect one, particularly if it filters off the most
common possiblities first.

> The problem is essentially the same that compilers
> face when compiling a case expression in Scheme or a
> switch statement in C/C++/Java. If the number of cases
> is very small, sequential search will probably be the
> fastest way to implement the case expression. If the
> number of cases is larger, a binary search will probably
> be faster than a sequential search. If the number of
> cases is very large, and dense within some range, then
> a jump table will probably be faster than binary search.
>
> The compiler probably knows more than you do about the
> fastest way to implement a case expression on the target
> architecture. That's why you should use case expressions,
> and let the compiler figure out the fastest way.


Maybe. I think it depends on the compiler. I remember reading a
discussion on the GCC mailing list where someone said that no-one uses
case statements with more than 400 cases, and that the compiler could
more or less assume that any code that did would be broken and not
attempt to generate good code for it.

> In all
> probability, the fastest technique will vary from one
> state of your DFA to another.


Yep.

> Regardless of the technique, whether your case expressions
> dispatch on ASCII or Unicode characters makes hardly any
> difference. You can see this by looking at the generated
> code.


But if you use ASCII then there are only 255 possibilites, so its easy
to use an indirect branch table on the basis of a byte. If it can be
done faster with a linear or binary search then that's good.

If 16-bit unicode is used though then a jump table would be huge.
Indirect branching wouldn't be that practical, so some combination of
binary and linear searching would be needed. I can't see that this
would be fast, certainly not as fast as a table or a small number of
direct branches.

You could of-course forget about some large set of Unicode characters
and more or less treat it as ASCII.

William D Clinger

2006-04-12, 7:04 pm

Rob Thorpe wrote:
> Sure. It depends on the amount of data to look through. If using an
> indirect jump costs 10 cycles, then if you can fit a set of directly
> branching tests into that time then they will be faster. For say 255
> possibilities I suppose its possible that a direct branching test may
> be as fast as an indirect one, particularly if it filters off the most
> common possiblities first.


I think you may be forgetting that the number of possible
characters on which you are dispatching is not what
matters. What matters is the number of case clauses,
and the number of subranges that select those clauses.

In a lexical analyzer for Scheme, for example, the most complex
state has about 15 possible actions. The number of subranges
is completely independent of the size of the character set,
so the size of the character set has zero influence on the
cost of the dispatch.

> Maybe. I think it depends on the compiler. I remember reading a
> discussion on the GCC mailing list where someone said that no-one uses
> case statements with more than 400 cases, and that the compiler could
> more or less assume that any code that did would be broken and not
> attempt to generate good code for it.