For Programmers: Free Programming Magazines  


Home > Archive > Functional > August 2007 > Minim compiler as an OCaml macro









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Minim compiler as an OCaml macro
Jon Harrop

2007-08-17, 8:06 am


Following Mark Tarver's recent shootout challenge to implement a program to
evaluate Minim programs, Pascal Constanza wrote a Minim compiler as a Lisp
compiler macro that has been by far the fastest of the implementations,
several orders of magnitude faster than most other implementations.

The following is a complete OCaml macro that extends the syntax of the OCaml
language to include the Minim sublanguage and expands minim programs into
OCaml code that is then compiled interactively, the same as the Lisp. This
macro performs tail-call optimization on the Lisp code, to ensure that the
resulting use of continuation passing style is robust. Specifically, an if
statement where both branches are tail calls is itself translated into a
tail call.

The OCaml macro is a third the length of the Lisp (in LOC) and can also be
extended to remove the superfluous parentheses from the Minim language and
implement operator precedences and associativites with virtually no
overhead:

open Camlp4.PreCast;;
open Camlp4.PreCast.Syntax;;

let mmstatement = Gram.Entry.mk "mmstatement";;
let mmvalue = Gram.Entry.mk "mmvalue";;
let mmtest = Gram.Entry.mk "mmtest";;
let mmident = Gram.Entry.mk "mmident";;

let rec compile _loc tag e = function
| [] -> <:binding< $lid:tag$ = fun () -> $e$ >>
| `TL e'::`T tag'::t ->
let bs = compile _loc tag' <:expr< () >> t in
<:binding< $lid:tag$ = fun () -> $e$; $e'$ and $bs$ >>
| (`TL e' | `E e')::t -> compile _loc tag <:expr< $e$; $e'$ >> t
| `T tag'::t ->
let bs = compile _loc tag' <:expr< () >> t in
<:binding< $lid:tag$ = fun () -> $e$; $lid:tag'$() and $bs$ >>;;

let compile _loc ss = compile _loc "entry" <:expr< () >> ss;;

EXTEND Gram
str_item: LEVEL "top"
[ [ "MINIM"; "("; ss=LIST1 mmstatement; ")" ->
<:str_item< let rec $compile _loc ss$ >>
] ];
mmstatement:
[ [ "("; x=mmident; "is"; y=mmvalue; ")" -> `E <:expr< $lid:x$ := $y$ >>
| "("; "++"; x=mmident; ")" -> `E <:expr< incr $lid:x$ >>
| "("; "--"; x=mmident; ")" -> `E <:expr< decr $lid:x$ >>
| "("; "if"; p=mmtest; "then"; t=mmstatement; "else";
f=mmstatement; ")" ->
(match t, f with
| `TL t, `TL f -> `TL <:expr< if $p$ then $t$ else $f$ >>
| (`TL t | `E t), (`TL f | `E f) ->
`E <:expr< if $p$ then $t$ else $f$ >>
| _ -> invalid_arg "Tag in 'if' expression")
| "("; "goto"; t=mmident; ")" -> `TL <:expr< $lid:t$() >>
| "("; "print"; s=STRING; ")" ->
let s = String.escaped s in
`E <:expr< print_string $str:s$ >>
| "("; "print"; x=mmvalue; ")" -> `E <:expr< print_int $x$ >>
| "nl" -> `E <:expr< print_newline() >>
| "("; "input"; x=mmident; ")" ->
`E <:expr< $lid:x$ := int_of_string(input_line stdin) >>
| tag=mmident -> `T tag ] ];
mmvalue:
[ [ x=mmident -> <:expr< ! $lid:x$ >>
| n=INT -> <:expr< $int:n$ >> ] ];
mmtest:
[ [ "("; f=mmvalue; "<"; g=mmvalue; ")" -> <:expr< $f$ < $g$ >>
| "("; f=mmvalue; "="; g=mmvalue; ")" -> <:expr< $f$ = $g$ >>
| "("; f=mmvalue; ">"; g=mmvalue; ")" -> <:expr< $f$ > $g$ >>
| "("; f=mmtest; "and"; g=mmtest; ")" -> <:expr< $f$ && $g$ >>
| "("; f=mmtest; "or"; g=mmtest; ")" -> <:expr< $f$ || $g$ >>
| "("; "not"; f=mmtest; ")" -> <:expr< not $f$ >> ] ];
mmident:
[ [ x=LIDENT -> "_"^x
| x=UIDENT -> "_"^x
| "end" -> "_end" ] ];
END;;

This macro may be invoked as follows:

let _x, _y = ref 0, ref 0;;

MINIM(
(print "Add x and y (what a feat!)")
nl
(print "Input x: ")
(input x)
nl
(print "Input y: ")
(input y)
main
(if (x = 0) then (goto end) else (goto sub1x))

sub1x
(-- x)
(++ y)
(goto main)

end
nl
(print "The total of x and y is ")
(print y)
nl
);;

The code generated by the expansion of the macro is:

let rec entry () =
(();
print_string "Add x and y (what a feat!)";
print_newline ();
print_string "Input x: ";
_x := int_of_string (input_line stdin);
print_newline ();
print_string "Input y: ";
_y := int_of_string (input_line stdin);
_main ())
and _main () = ((); if !_x = 0 then _end () else _sub1x ())
and _sub1x () = ((); decr _x; incr _y; _main ())
and _end () =
(();
print_newline ();
print_string "The total of x and y is ";
print_int !_y;
print_newline ());;

The new definitions are invoked by calling entry(). Performance from a
native-code ocamlnat top-level is:

# time entry ();;
Add x and y (what a feat!)
Input x: 10000000
Input y: 10000000
The total of x and y is 20000000
Took 0.094985s
- : unit = ()

Note that this is still 3.3x slower than Pascal's Lisp:

* (time (test-minim))
Add x and y
Input x: 10000000
Input y: 10000000

The total of x and y is 20000000
Evaluation took:
0.032 seconds of real time
0.030996 seconds of user run time
0.0 seconds of system run time
0 calls to %EVAL
0 page faults and
8,160 bytes consed.
NIL

Using batch compilation at the cost of non-interactivity makes the OCaml as
fast as the Lisp. However, the Lisp is interactive...

On a related note, I've been writing programs to evaluate Brainf*ck
programs. My MetaOCaml implementations that used staged native code
compilation at run-time are no faster than a naive interpreter. I believe
the reason is simply that MetaOCaml is not good at dealing with target
languages that use gotos because OCaml does not compile CPS as efficiently
as ordinary calls.

--
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/produc...entists/?usenet
Karol Skocik

2007-08-17, 8:06 am

On Aug 17, 10:31 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Following Mark Tarver's recent shootout challenge to implement a program to
> evaluate Minim programs, Pascal Constanza wrote a Minim compiler as a Lisp
> compiler macro that has been by far the fastest of the implementations,
> several orders of magnitude faster than most other implementations.
>
> The following is a complete OCaml macro that extends the syntax of the OCaml
> language to include the Minim sublanguage and expands minim programs into
> OCaml code that is then compiled interactively, the same as the Lisp. This
> macro performs tail-call optimization on the Lisp code, to ensure that the
> resulting use of continuation passing style is robust. Specifically, an if
> statement where both branches are tail calls is itself translated into a
> tail call.
>
> The OCaml macro is a third the length of the Lisp (in LOC) and can also be
> extended to remove the superfluous parentheses from the Minim language and
> implement operator precedences and associativites with virtually no
> overhead:
>
> open Camlp4.PreCast;;
> open Camlp4.PreCast.Syntax;;
>
> let mmstatement = Gram.Entry.mk "mmstatement";;
> let mmvalue = Gram.Entry.mk "mmvalue";;
> let mmtest = Gram.Entry.mk "mmtest";;
> let mmident = Gram.Entry.mk "mmident";;
>
> let rec compile _loc tag e = function
> | [] -> <:binding< $lid:tag$ = fun () -> $e$ >>
> | `TL e'::`T tag'::t ->
> let bs = compile _loc tag' <:expr< () >> t in
> <:binding< $lid:tag$ = fun () -> $e$; $e'$ and $bs$ >>
> | (`TL e' | `E e')::t -> compile _loc tag <:expr< $e$; $e'$ >> t
> | `T tag'::t ->
> let bs = compile _loc tag' <:expr< () >> t in
> <:binding< $lid:tag$ = fun () -> $e$; $lid:tag'$() and $bs$ >>;;
>
> let compile _loc ss = compile _loc "entry" <:expr< () >> ss;;
>
> EXTEND Gram
> str_item: LEVEL "top"
> [ [ "MINIM"; "("; ss=LIST1 mmstatement; ")" ->
> <:str_item< let rec $compile _loc ss$ >>
> ] ];
> mmstatement:
> [ [ "("; x=mmident; "is"; y=mmvalue; ")" -> `E <:expr< $lid:x$ := $y$ >>
> | "("; "++"; x=mmident; ")" -> `E <:expr< incr $lid:x$ >>
> | "("; "--"; x=mmident; ")" -> `E <:expr< decr $lid:x$ >>
> | "("; "if"; p=mmtest; "then"; t=mmstatement; "else";
> f=mmstatement; ")" ->
> (match t, f with
> | `TL t, `TL f -> `TL <:expr< if $p$ then $t$ else $f$ >>
> | (`TL t | `E t), (`TL f | `E f) ->
> `E <:expr< if $p$ then $t$ else $f$ >>
> | _ -> invalid_arg "Tag in 'if' expression")
> | "("; "goto"; t=mmident; ")" -> `TL <:expr< $lid:t$() >>
> | "("; "print"; s=STRING; ")" ->
> let s = String.escaped s in
> `E <:expr< print_string $str:s$ >>
> | "("; "print"; x=mmvalue; ")" -> `E <:expr< print_int $x$ >>
> | "nl" -> `E <:expr< print_newline() >>
> | "("; "input"; x=mmident; ")" ->
> `E <:expr< $lid:x$ := int_of_string(input_line stdin) >>
> | tag=mmident -> `T tag ] ];
> mmvalue:
> [ [ x=mmident -> <:expr< ! $lid:x$ >>
> | n=INT -> <:expr< $int:n$ >> ] ];
> mmtest:
> [ [ "("; f=mmvalue; "<"; g=mmvalue; ")" -> <:expr< $f$ < $g$ >>
> | "("; f=mmvalue; "="; g=mmvalue; ")" -> <:expr< $f$ = $g$ >>
> | "("; f=mmvalue; ">"; g=mmvalue; ")" -> <:expr< $f$ > $g$ >>
> | "("; f=mmtest; "and"; g=mmtest; ")" -> <:expr< $f$ && $g$ >>
> | "("; f=mmtest; "or"; g=mmtest; ")" -> <:expr< $f$ || $g$ >>
> | "("; "not"; f=mmtest; ")" -> <:expr< not $f$ >> ] ];
> mmident:
> [ [ x=LIDENT -> "_"^x
> | x=UIDENT -> "_"^x
> | "end" -> "_end" ] ];
> END;;
>
> This macro may be invoked as follows:
>
> let _x, _y = ref 0, ref 0;;
>
> MINIM(
> (print "Add x and y (what a feat!)")
> nl
> (print "Input x: ")
> (input x)
> nl
> (print "Input y: ")
> (input y)
> main
> (if (x = 0) then (goto end) else (goto sub1x))
>
> sub1x
> (-- x)
> (++ y)
> (goto main)
>
> end
> nl
> (print "The total of x and y is ")
> (print y)
> nl
> );;
>
> The code generated by the expansion of the macro is:
>
> let rec entry () =
> (();
> print_string "Add x and y (what a feat!)";
> print_newline ();
> print_string "Input x: ";
> _x := int_of_string (input_line stdin);
> print_newline ();
> print_string "Input y: ";
> _y := int_of_string (input_line stdin);
> _main ())
> and _main () = ((); if !_x = 0 then _end () else _sub1x ())
> and _sub1x () = ((); decr _x; incr _y; _main ())
> and _end () =
> (();
> print_newline ();
> print_string "The total of x and y is ";
> print_int !_y;
> print_newline ());;
>
> The new definitions are invoked by calling entry(). Performance from a
> native-code ocamlnat top-level is:
>
> # time entry ();;
> Add x and y (what a feat!)
> Input x: 10000000
> Input y: 10000000
> The total of x and y is 20000000
> Took 0.094985s
> - : unit = ()
>
> Note that this is still 3.3x slower than Pascal's Lisp:
>
> * (time (test-minim))
> Add x and y
> Input x: 10000000
> Input y: 10000000
>
> The total of x and y is 20000000
> Evaluation took:
> 0.032 seconds of real time
> 0.030996 seconds of user run time
> 0.0 seconds of system run time
> 0 calls to %EVAL
> 0 page faults and
> 8,160 bytes consed.
> NIL
>
> Using batch compilation at the cost of non-interactivity makes the OCaml as
> fast as the Lisp. However, the Lisp is interactive...
>
> On a related note, I've been writing programs to evaluate Brainf*ck
> programs. My MetaOCaml implementations that used staged native code
> compilation at run-time are no faster than a naive interpreter. I believe
> the reason is simply that MetaOCaml is not good at dealing with target
> languages that use gotos because OCaml does not compile CPS as efficiently
> as ordinary calls.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy
> OCaml for Scientistshttp://www.ffconsultancy.com/products/ocaml_for_scientists/?usenet


what a horrible unreadable mess... !
thank god for lisp parens so that macros don't look like this.

Pascal Costanza

2007-08-17, 8:06 am

Jon Harrop wrote:
> Following Mark Tarver's recent shootout challenge to implement a program to
> evaluate Minim programs, Pascal Constanza wrote a Minim compiler as a Lisp
> compiler macro that has been by far the fastest of the implementations,
> several orders of magnitude faster than most other implementations.


....and here is the kicker: I couldn't care less...

> Using batch compilation at the cost of non-interactivity makes the OCaml as
> fast as the Lisp. However, the Lisp is interactive...


....because that's more important, among other things.

The result is no surprise: It is well known in the Lisp community that
you can make Lisp code run as fast as required, provided you are willing
to invest the time to do so. The Minim interpreter was especially
straightforward to optimize because the Minim language consists mainly
of constructs that have direct counterparts in Common Lisp. It mostly
serves as an illustration that high efficiency is indeed possible to
achieve in Common Lisp. However, it is still a toy example and doesn't
say a lot about more complex and involved programs.

Lisp and Scheme give you a lot of advantages when you are interested in
dynamic and interactive features, and want to consider programs as
"living" systems that you can interact with, not as text files that need
to be translated and that you cannot touch anymore after such
translation. In Lisp, only if and when you detect serious bottlenecks,
you invest the time to detect the hot spots and then optimize them. At
least, that's the typical approach.

Lisp buys you flexibility by default, but you have to work for
efficiency. (That's in general true for other dynamic languages as well,
like Smalltalk, Prolog, etc.) In static languages, it's the other way
around: They buy you (low-level) efficiency by default, but you have to
work for flexibility. This gives you a pretty good rule of thumb when to
use what kind of languages.

Toy examples and toy benchmarks don't give you any useful rules of
thumbs. Knowing that a simple ray tracer is more efficient in OCaml and
that a simple assembly-style language is more efficient in Common Lisp
is pretty uninteresting. They are not realistic enough to be
representative of real-world software.



Pascal

P.S.: It's "Costanza", not "Constanza".

--
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
Alex Mizrahi

2007-08-17, 7:11 pm

(message (Hello 'Jon)
(you :wrote :on '(Fri, 17 Aug 2007 09:31:09 +0100))
(

JH> let rec compile _loc tag e = function
JH> | [] -> <:binding< $lid:tag$ = fun () -> $e$ >>
JH> | `TL e'::`T tag'::t ->

Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn

JH> let bs = compile _loc tag' <:expr< () >> t in
JH> <:binding< $lid:tag$ = fun () -> $e$; $e'$ and $bs$ >>

)
(With-best-regards '(Alex Mizrahi) :aka 'killer_storm)
"choose no life")


George Neuner

2007-08-18, 8:05 am

On Fri, 17 Aug 2007 18:07:48 +0300, "Alex Mizrahi"
<udodenko@users.sourceforge.net> wrote:


>Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn


He's awake and posting to c.l.l. What's the spell to send him back?

George
--
for email reply remove "/" from address
Joachim Durchholz

2007-08-18, 8:05 am

David Golden schrieb:
> The mistake is to suppose that once a macro is called, the Lisp system
> enters a "macro world," so naturally everything in that world must be
> defined using defmacro. This is the wrong picture. The right picture is
> that defmacro enables a step into the ordinary Lisp world, but in which
> the principal object of manipulation is Lisp code. Once that step is
> taken, one uses ordinary Lisp function definitions


I would have expected that. It would have been nonsensical to use a
special macro language just to transform S-expressions; ordinary Lisp
would do fine.
Actually this is the usual setup in any environment where there is no
distinction between a compilation and a run-time phase. (That's actually
something that I appreciate very much, it's just that this kind of
environment can be very tricky to separate in independent parts. That's
my main issue with such environments: the temptation to drop yet another
barrier is too high because the benefits are so large, resulting in
systems that work excellently on a small scale but require extraordinary
discipline when the system starts to scale up.)

Back to the quote: it's only partly correct. In a macro, you want to do
drastically different things (transform code, instead of process user
requests), and you have a drastically different environment
(compilation, not runtime). So while the quote is correct in the sense
that it's still the same old Lisp, it's semantically still a "macro world".
Of course, that's not what the quote meant. It's just that I'd like to
complement that "the world isn't that different inside macros" view with
"but there *are* some pretty important differences"; in my eyes, Lisp
minimizes the differences that can be minimized, but some differences
cannot be eliminated - and I still think that the additional
programmer-visible complexity is a high price to pay for some additional
ways to write shorter code. *shrug* possibly just a cultural thing.
Possibly.

Regards,
Jo
Jon Harrop

2007-08-18, 7:07 pm

Joachim Durchholz wrote:
> ...in my eyes, Lisp minimizes the differences that can be minimized...


Look at the source code of cl-infix.

--
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/produc...entists/?usenet
Markus

2007-08-19, 4:27 am


David Golden wrote:

> Joachim Durchholz wrote:
>
>
>
> Well, good news for you then: ;-)
> If you do want some variant quotation mechanism in Lisp, you can just
> implement it as a macro!


Yeah, yeah, thanks. I'm sure, Jon is absolutely enraptured. Probably
this is the very thing he has been hoping for. Next w will see him
doing everything in Lisp from now on.

> One possible explanation for this mistake may be that in other


"this mistake" => The mistake none of us made.

Regards -- Markus
David Golden

2007-08-19, 8:04 am

Markus E.L. 2 wrote:

> Yeah, yeah, thanks. I'm sure, Jon is absolutely enraptured.


That was a reply to Joachim - he appeared to be considering
nonextensible quoting something of an advantage due
to less "complexity", I was helpfully pointing out that it wasn't
really an advantage that existed in lisp, one can simply implement
macros with different behaviour, reader and ordinary macros both being
pretty arbitrary lisp code in lisp. It wouldn't be hugely _sensible_ to
in many situations, but you can bet if the freedom to do so was somehow
taken away, people would be annoyed.

> "this mistake" => The mistake none of us made.


Speaking of Jon - As it happens,for his part, Jon (Harrop, at least) did
fundamentally misunderstand (or pretend to) lisp macros, as you can see
here (yes I killfiled Jon like I said I would, but not all respondents
to his posts. Sigh):
http://groups.google.com/group/comp...910b12119b1c30e

It looks like Jon thought lisp macros were a sexp-syntax rewrite
language where defmacro forms specify template rules. That roughly
describes scheme define-syntax "macros by example", not lisp macros.
In conjunction with backquoting, many simple convenience macros often
look like templates but there's no law that says they must (in lisp).





David Golden

2007-08-19, 7:09 pm

Joachim Durchholz wrote:

> I would have expected that. It would have been nonsensical to use a
> special macro language just to transform S-expressions; ordinary Lisp
> would do fine.


You may have expected that, but note that a special macro language is
pretty much what the schemers do, in an attempt to maintain properties
they consider desirable (Kiselyov, Petrofsky results undermining that
attempt...). The scheme macro language [1] still has a sexp syntax,
but is rather different to lisp macros written in unrestricted
lisp. This is one of the more major issues in the scheme/lisp divide
(though in practice it is possible to implement scheme-like macros in
lisp, as seen in [2], and major scheme implementations tend to provide
something much like lisp macros).

Other minor quibble/clarification: "s-expression" really means the
textual external form (though is obviously deliberately very close in
perceived structure to the internal form) - the same tree could have a
different textual external form given a different printer (and reader
if you want to read it back in...). By the time an ordinary macro*
sees lisp code, it's a tree of live lisp objects.

* As distinct from a reader macro, which gets passed the unparsed
stream.

> I still think that the additional
> programmer-visible complexity is a high price to pay for some
> additional ways to write shorter code.


Shrug - they just transform code. If you can write and understand a
function that inputs and outputs cons list/tree structure, you can
probably write and understand a macro. Not necessarily a good one first
time around, but that's a matter of learning. Ob c.l.f - it would
usually be best if the macro was reasonably pure, too, certainly if a
macro tried to mutate its arguments you'd tend to stray into
implementation-specific territory - i.e. you should tend to write macro
transformations in a functional style...

"complexity" might come from the possibility of code being transformed
by a macro in the first place, but once you decide you _want_ to
transform code, you don't really get much simpler than lisp macros -
i.e. IF you're going to transform code, you should minimise hassle of
doing so. As you said, lisp minimises the differences.

* I'd consider the argument on page 60, "density" of Paul Graham's "On
Lisp" to apply (yes, it applies to functions themselves too...) - "So
yes, reading a bottom-up program requires one to understand all the new
operators defined by the author. But this will nearly always be less
work than having to understand all the code that would have been
required without them".

[1]
http://schemers.org/Documents/Stand....html#%_sec_4.3
[2] http://www.ccs.neu.edu/home/dorai/mbe/mbe-lsp.html
[3] http://www.paulgraham.com/onlisp.html
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com