Home > Archive > Compilers > March 2004 > Implementation Language Choice
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Implementation Language Choice
|
|
| Kevin Albrecht 2004-03-27, 12:28 am |
| I am an experienced compiler/interpreter writer, but have always
written my compilers in C/C++ so far. As almost everyone will admit,
C/C++ are hardly ideal languages. I am considering writing my next
compiler in some other language, but I am unsure of what language to
use. What languages have others found useful as implementation
languages?
The following features are important to me in a choice for a better
implementation language:
* mature compiler(s) that produces native-code binaries
* open source - preferable, but not 100% necessary
Thanks,
Kevin Albrecht
| |
|
|
| Stephan Wienczny 2004-03-27, 12:28 am |
| Kevin Albrecht wrote:
> I am an experienced compiler/interpreter writer, but have always
> written my compilers in C/C++ so far. As almost everyone will admit,
> C/C++ are hardly ideal languages. I am considering writing my next
> compiler in some other language, but I am unsure of what language to
> use. What languages have others found useful as implementation
> languages?
You could give D a try. There have been some languages called D since
C got popular, I'm talking about Walter Bright's one. The
documentation is
available @ http://www.digitalmars.com/d
If you like C and C++ you will love D
It does not have a lot of completly new features, but implements the
currently used in most ones.
You have got templates, (limited) operatator overloading, design by
contract, a garbage collector etc.
> The following features are important to me in a choice for a better
> implementation language:
>
> * mature compiler(s) that produces native-code binaries
That might be a problem...
The only currently working compiler is Walter's one.
It produces native code that is quite fast, but not yet finished.
Walter wants to release 1.0 this year in March.
I'm not sure about that date.
There are different projects writing new compilers.
> * open source - preferable, but not 100% necessary
The D Frontend is open source, but the backend is closed source.
> Thanks,
> Kevin Albrecht
Maybe you want to join a D compiler project, I recently started.
Stephan Wienczny
| |
| Basile Starynkevitch [news] 2004-03-27, 12:28 am |
| Le 12-02-2004, Kevin Albrecht <kevin@albrecht.net> a écrit_:
> I am an experienced compiler/interpreter writer, but have always
> written my compilers in C/C++ so far. As almost everyone will admit,
> C/C++ are hardly ideal languages. [...]
>
> The following features are important to me in a choice for a better
> implementation language:
>
> * mature compiler(s) that produces native-code binaries
> * open source - preferable, but not 100% necessary
Ocaml fits the bill and has been used in several compiler-like related
projects. If you are interested to implement a compiler for some
variant of C, you might reuse part of CIL.
see www.ocaml.org or caml.inria.fr for OCAML
see http://manju.cs.berkeley.edu/cil/ for CIL
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France
| |
| Joachim Durchholz 2004-03-27, 12:28 am |
| Kevin Albrecht wrote:
> What languages have others found useful as implementation
> languages?
Functional languages have been reported to be exceedingly strong in this
area. You essentially define a few helper functions, then the parser is
a simple transliteration of the grammar rules. (The keywords to google
for are "parsing combinators".)
In particular, it seems to be easy to attach different things to the
parsers (for "things" say prettyprinters, code transformers, compilers,
and whatnot).
The downside is that the approach will produce only recursive-descent
grammars. Good enough if you're free to define the syntax, often
unworkable if the syntax is a given.
> The following features are important to me in a choice for a better
> implementation language:
>
> * mature compiler(s) that produces native-code binaries
> * open source - preferable, but not 100% necessary
Most FPLs will qualify (nowadays).
Regards,
Jo
--
Currently looking for a new job.
| |
| Kevin Albrecht 2004-03-27, 12:28 am |
| I was asked what type of compiler I am making. It is
for a statically-typed language with mixed imperative/
declarative features.
-- Kevin Albrecht
| |
| Lex Spoon 2004-03-27, 12:29 am |
| Joachim Durchholz <joachim.durchholz@web.de> writes:
> Kevin Albrecht wrote:
>
>
> Functional languages have been reported to be exceedingly strong in this
> area. You essentially define a few helper functions, then the parser is
> a simple transliteration of the grammar rules. (The keywords to google
> for are "parsing combinators".)
> In particular, it seems to be easy to attach different things to the
> parsers (for "things" say prettyprinters, code transformers, compilers,
> and whatnot).
Additionally, functional languages seem effective for the later stages
of a compiler. Most compilers can be written as a long series of
transformations of representations starting with source code and
ending with binary code; each one of these representations can usually
be represented as tree structures or as lists of tree structures.
Functional languages are *terrific* at converting one list of trees
into another list of trees.
The feature that is especially nice is the pattern-matching switch
statement that most functional languages include. It lets you break
encapsulation and dig multiple levels into the structure of a tree, so
that, say, a move from a register to a register, is handled in a
separate branch from a move from a memory location to a memory
location that is specified directly in a register.
But also garbage collection is a key advantage. As well, it is nice
to have higher-order functions themselves can prove useful; you can
write little routines that will wander over a tree structure and apply
a function you specify. But many languages have these things
nowadays. The relatively unusual feature is the deeply reaching
switch statements.
I sometimes wonder if functional language designers know about any
programs *other* than compilers. :) However you take this, it is
clear that the ML family has proven a very effective implementation
language for compilers.
-Lex
| |
| Joachim Durchholz 2004-03-27, 12:29 am |
| Lex Spoon wrote:
> The feature that is especially nice is the pattern-matching switch
> statement that most functional languages include.
Agreed, pattern matching is nice. ("Pattern matching" as in "use a
typesafe union type, and recognize the variant at hand and disassemble
the record elements, all in a single construct"; the pattern matching
we're talking about here is unrelated to regular expressions or image
processing.)
Though it's beyond me why this never made it into mainstream imperative
languages; it's exceedingly useful.
> But also garbage collection is a key advantage.
Agreed, and, again, useful beyond functional programming.
> I sometimes wonder if functional language designers know about any
> programs *other* than compilers. :)
There's indeed a bias in that direction.
However, I think that's normal for languages that haven't broken out
into the mainstream.
> However you take this, it is clear that the ML family has proven a
> very effective implementation language for compilers.
Not just ML languages. There's the Haskell family, and there are less
well-known languages like Oz, Alice, or Scala.
All are good at implementing languages. It's a consequence of having
higher-order functions - it makes it so easy to implement languages that
the language designers often go the extra step to make it *really* easy.
This has been creating a bias for languages that are used for compiling
themselves and not much else, but ML and Haskell are definitely beyond
that stage now.
Regards,
Jo
--
Currently looking for a new job.
| |
| Gabriel Dos Reis 2004-03-27, 12:29 am |
| Joachim Durchholz <joachim.durchholz@web.de> writes:
| Lex Spoon wrote:
| > The feature that is especially nice is the pattern-matching switch
| > statement that most functional languages include.
|
| Agreed, pattern matching is nice. ("Pattern matching" as in "use a
| typesafe union type, and recognize the variant at hand and disassemble
| the record elements, all in a single construct"; the pattern matching
| we're talking about here is unrelated to regular expressions or image
| processing.)
|
| Though it's beyond me why this never made it into mainstream imperative
| languages; it's exceedingly useful.
Probably because people have been using the more general Visitor Pattern?
-- Gaby
| |
| Joachim Durchholz 2004-03-27, 12:29 am |
| Gabriel Dos Reis wrote:
> Joachim Durchholz <joachim.durchholz@web.de> writes:
>
>
> Probably because people have been using the more general Visitor
> Pattern?
It's just the other way round: pattern matching is more general than
Visitor.
Visitor is for iterating through collections of polymorphic data.
Pattern matching is for inspecting and acting upon a single instance
of some polymorphic data. Actually it's the reverse of OO-style
polymorphism: in OO, the set of operations is fixed and the set of
types is extensible; pattern matching assumes a fixed set of types and
an extensible set of operations.
Here's how pattern matching works:
1. You define a union of record types. Something like
Tree = Leaf value: int | Node left: Tree; right: Tree
which translates to a record with a "selector" field that distinguishes
the "int" case from the "left: Tree; right: Tree" case, and a memory
area that either contains the int or the two Tree fields. (I tried to
write this in C, but the syntax got so riddled with irrelevant detail
that I decided to stick with the prose description.)
2. Whenever your code does something with a Tree data object, it does
pattern matching, like this:
case t of
Leaf i: ... // Do something with i.
Node l, r: ... // Do something with l and r.
end
The "i" above gets bound to the "int" field in the Tree (Leaf variant)
record.
The "l" and "r" above get bound to the "left" and "right" fields in the
Tree (Node variant) record.
The nice thing about pattern matching is that it does in one fell swoop
what you want anyway: detect which case is at hand, and access the
fields in it in some handy local variables because you don't want to
write t->value, t->left, t->right all over the place.
Even nicer is the following: accessing t->value in the "l, r" branch
would be perfectly valid C, and require a run-time check in Pascal; with
pattern matching, "i" is out of scope in the "l, r" branch and vice
versa, so any accidental misuse will be caught at compile time.
Regards,
Jo
--
Currently looking for a new job.
| |
| Lauri Alanko 2004-03-27, 12:29 am |
| Gabriel Dos Reis <gdr@integrable-solutions.net> virkkoi:
> Joachim Durchholz <joachim.durchholz@web.de> writes:
[On pattern matching]
> | Though it's beyond me why this never made it into mainstream imperative
> | languages; it's exceedingly useful.
>
> Probably because people have been using the more general Visitor Pattern?
The more excruciatingly painful, you mean?
I once participated in writing a toy OQL compiler. The implementation
language was preordained to be C++. We did things by the book and used
the visitor pattern instead of "ugly" dynamic casts to discriminate
between different syntactic classes.
It was horrible, really. Having to write a trivial visit-method in
every class, and having to write all the auxiliary visitor classes
whenever the syntax tree had to be traversed. I couldn't help thinking
all the time how much simpler everything would have been had we just
used Haskell, defined a simple datatype for the abstract syntax, and
pattern matched on the terms.
I guess the code would have been shorter by a factor of ten, maybe.
Lauri Alanko
la@iki.fi
| |
| Lex Spoon 2004-03-27, 12:29 am |
| Joachim Durchholz <joachim.durchholz@web.de> writes:
> Lex Spoon wrote:
>
> Agreed, pattern matching is nice. ("Pattern matching" as in "use a
> typesafe union type, and recognize the variant at hand and disassemble
> the record elements, all in a single construct"; the pattern matching
> we're talking about here is unrelated to regular expressions or image
> processing.)
>
> Though it's beyond me why this never made it into mainstream imperative
> languages; it's exceedingly useful.
Most languages seem to allow defining new variants in different parts
of the code. For example, an OO language will let you make new
subclasses at other parts of the code. This makes it harder to define
and to compile (see, on topic!) the case construct.
Also, there is a downside in that you are decomposing the *concrete
represenation*. Many languages -- and especially OO languages --
treat this as a breach of encapsulation and try to prevent it.
-Lex
| |
| Torben Ægidius Mogensen 2004-03-27, 12:29 am |
| Lex Spoon <lex@cc.gatech.edu> writes:
> Joachim Durchholz <joachim.durchholz@web.de> writes:
>
>
> Most languages seem to allow defining new variants in different parts
> of the code. For example, an OO language will let you make new
> subclasses at other parts of the code. This makes it harder to define
> and to compile (see, on topic!) the case construct.
>
> Also, there is a downside in that you are decomposing the *concrete
> represenation*. Many languages -- and especially OO languages --
> treat this as a breach of encapsulation and try to prevent it.
This is only partly true: A compiler has some freedom in how the
datatype is implemented. However, I agree that it pattern-matching is
usually restricted to a single datatype (apart from polymorphism on
some fields).
This need, however, not be true. You can use Haskell-style type
classes to declare several types as instances of a type-class that use
a particular set of patterns by letting each type implement a selector
and a set of functions that provide the values bound in the pattern.
This is similar to the "views" mechanism proposed by Wadler many years
ago.
This need not be notationally cumbersome. You can let a datatype
declaration (which in Haskell-parlance means a set of constructors)
double as a type-class declaration with a default instance (which is
the concrete datatype) but allow other types to define other instances
of the same type-class.
However, in the case of abstract syntax, I don't really see much need
to hide the implementation.
The addition of new constructs to a language by extending the abstract
is more relevant and not as obvious to solve. The usual solution is
to find every place where pattern-matching is made on the syntax and
add new cases. This is not as bad as it sounds, as the type-checker
can identify the places where it is needed. And while separating each
syntactic construct into different parts of the program might seem to
aid modularity, it can make it difficult to understand the program.
Torben
| |
| Lorenzo Bettini 2004-03-27, 12:29 am |
| Lauri Alanko wrote:
> Gabriel Dos Reis <gdr@integrable-solutions.net> virkkoi:
>
>
> [On pattern matching]
>
>
> The more excruciatingly painful, you mean?
>
> I once participated in writing a toy OQL compiler. The implementation
> language was preordained to be C++. We did things by the book and used
> the visitor pattern instead of "ugly" dynamic casts to discriminate
> between different syntactic classes.
>
> It was horrible, really. Having to write a trivial visit-method in
> every class, and having to write all the auxiliary visitor classes
> whenever the syntax tree had to be traversed. I couldn't help thinking
> all the time how much simpler everything would have been had we just
> used Haskell, defined a simple datatype for the abstract syntax, and
> pattern matched on the terms.
Hi,
I think in these cases the best thing would be to have double dispatch
in the language, for this reason I implemented doublecpp.
Doublecpp is a preprocessor for C++ that handles a new linguistic
construct for defining branches of a multi-method. The "right" branch
of such a method will be selected dynamically at run-time according to
the actual type of the object on which the method is invoked and to the
actual type of the first argument: double dispatch.
http://www.lorenzobettini.it/softwa...ecpp/index.html
Doublecpp is free software; you are free to use, share and modify it
under the terms of the GNU General Public License (see COPYING).
hope this helps
Lorenzo
--
+-----------------------------------------------------+
| Lorenzo Bettini ICQ# lbetto, 16080134 |
| PhD in Computer Science |
| Dip. Sistemi e Informatica, Univ. di Firenze |
| Tel +39 055 4796741, Fax +39 055 4796730 |
| Florence - Italy (GNU/Linux User # 158233) |
| |
| Joachim Durchholz 2004-03-27, 12:29 am |
| Lorenzo Bettini wrote:
>
> I think in these cases the best thing would be to have double
> dispatch in the language,
Note that double dispatch doesn't solve the extendibility problem
that's inherent in such situations, it just moves it from technical
context to social context (which doesn't make it easier to solve).
I.e. if you have NxM possibilities, it's easy to keep the N
possibilities extendible, and it's easy to keep the M possibilities
extendible, but you get into trouble as soon as you try to keep both
extendible. You get into technical difficulties (module independence,
or method selection strategies that will break in some cases) and/or
social difficulties (if those who are responsible for extending in the
N direction are different from those of extending in the M direction -
imagine than N is hardware variants offered by IBM and M is software
variants offered by Microsoft and you'll see what I mean).
Oh, and preprocessors are evil (they don't compose).
:-)
Regards,
Jo
--
Currently looking for a new job.
| |
| Mayan 2004-03-27, 12:29 am |
| Kevin Albrecht wrote:
> I am an experienced compiler/interpreter writer, but have always
> written my compilers in C/C++ so far. As almost everyone will admit,
> C/C++ are hardly ideal languages. I am considering writing my next
> compiler in some other language, but I am unsure of what language to
> use. What languages have others found useful as implementation
> languages?
>
> The following features are important to me in a choice for a better
> implementation language:
>
> * mature compiler(s) that produces native-code binaries
> * open source - preferable, but not 100% necessary
Several observations, which are related to the backend
(i.e. everything after the parser).
Optimizations have to be done *fast*; even when doing whole program
compilation of very large programs (100kloc+), users expect compile
times of at best a few minutes. And for the case of compiling single
large files (say 10kloc) people expect compile times to be
considerably under a minute, even with all optimizations turned on.
With the more aggresive optimizations, memory usage will be quite
large. If you're doing whole program analysis, you could easily
exceed 0.5 GB.
Both of these suggest that you need to worry about both performance
and memory usage. Of course, if you're dealing with small compile
units, and simple optimizations, then this may not be an issue.
Also, in the backend, you don't need too many data-types. For
instance, in our current compiler, other than collection classes, we
have 29 core data-structures (i.e. IRs + symbol-table). Not only is
there no inheritance, there is no opportunity for inheritance.
We have to do a little bit of work to get strongly typed collection
classes, but its relatively straight-forward.
In the backend, there are very few unions, and no anonymous unions.
Based on previous compilers, there is one place where unions, typed or
otherwise, would have been applicable - in the representation of
constants. We get away from that by a couple of things:
- In general, the IR is "strongly-typed" - i.e. if a constant appears at
some point, it can only be of a particular type.
- All constants are represented as strings. Thus, if an add has an
immediate argument, 4, it is represented by the string "4". This allows
us to treat all constants identically for the most part, at the cost of
having to convert from the string representation to the actual constant
when needed (though, of course, we do optimize this conversion).
Traversals are broken into two steps: we walk the IR, gathering all
elements into an array, and then iterate over the array. Thus, we do
things like:
n = size_graph(G);
nodes = alloca(n * sizeof(Node *));
post_order_depth_first_walk(G, n, nodes);
for( i = n-1; i >= 0; i-- ) { /* reverse post order dfs */
do_something(nodes[i]);
}
Summary: if you're worrying about the language in which to write a
compiler, you've probably got more serious problems. I've written about
7 serious compiler/interpreters/translators in C, LISP and C++, and IMO
C is the best language for writing compilers.
If I had to pick a second-best, it would probably be Ada-95.
| |
| Barak Zalstein 2004-03-27, 12:29 am |
| > Summary: if you're worrying about the language in which to write a
> compiler, you've probably got more serious problems. I've written
> about 7 serious compiler/interpreters/translators in C, LISP and
> C++, and IMO C is the best language for writing compilers.
> If I had to pick a second-best, it would probably be Ada-95.
Why not LISP? It was tried before
(http://compilers.iecc.com/comparch/article/00-10-100) and is possibly
better for concurrently handling phase ordered problems (not that I
know much about it axcept for configuring the editor). Was it the
performance, portability, or the strange data types/coding style that
drove people to other languages?
Barak.
|
|
|
|
|