For Programmers: Free Programming Magazines  


Home > Archive > Functional > May 2007 > Language features for compiler writing was: Java compiler courses









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Language features for compiler writing was: Java compiler courses
Chris F Clark

2007-04-27, 10:03 pm

Again, let me preface my comments, by saying I think a functional
programming language like ML or Haskell is probably a best language
for a first compiler construction course (and perhaps a best first
language in general). Having said that, I want to backtrack and argue
some other points (and in the end ask a question).

Not to pick on this particular post, but it brings up issues I want to
address, so it's the candidate of choice.

torbenm@app-7.diku.dk (Torben Ægidius Mogensen) writes:

> Hans-Peter Diettrich <DrDiettrich1@aol.com> writes:
>
>
> C is a horrible language for writing compilers due to bad
> infrastructure for tree and list structures (i.e., GC, polymorphism
> and pattern matching).


You can get polymorphism, by using C++ as your C. You should be able
to get pattern matching from your compiler-compiler (or other) add-on
tool. I don't know that you can in many cases, but you should be able
to get it, and if you can't that's a fault in the tool chain.

That leaves GC. You can get GC by going to Java (or C# or even
"managed C++"). More importantly, you can get poor-person's GC in a
variety of means (the Boehm collector) or a variety of coding
conventions. Now, all of those solutions are stop-gap and lack
something that real built-in GC gives one--essentially worry-free
programming (at least worry-free at one level).

Here is the point I want to debate/discuss. What C and its semantic
(if not syntactic) cousins Pascal and PL1/G have is the ability to
create a modify complex graph structures (with complex vertexes and
edges) in place (i.e. by side-effect). (This ability exists in all
languages where one can explicitly manipulate pointers and not just
lists.) Now this may be a dubious advantage, but it does make certain
things work well. There are times I would trade GC for this
advantage. Now, perhaps this just shows my age. Moreover, I don't
like trading pointers for higher-order-functions.

Moreover, since long practice has shown that most people (me included)
are not competent at mantaining the discipline to write bug-free
programs when using that advantage, perhaps it is a good thing to have
banned. However, it is the model that the C-level programming
languages support. Perhaps, there is a monad in Haskell that allows
one to write the same code in a more type-secure way, but unless there
is, there are still some advantages to languages with C's disregard
for safety that let you write what you need.

On the other hand, people have known how to simulate the feature since
at least the time of FORTRAN. One simply uses integers as indexes
into parallel arrays. One array for each different field (member) of
the vertex (or edge) type. In fact, some code I'm hacking on right at
the moment uses exactly that technique. Plus, one can make the arrays
be arrays of structures and thus keep them automatically in sync.
However, there is something missing in this method. Part of that is
that the direct reference syntax isn't quite "pointer-like" (and that
could be fixed by syntactic sugar). However, I think the real issue is
that the elements are not independent objects in themselves. That is,
the GC will not come along and say index 10 is no longer referenced
and we can eliminate (collect) array[10].

So, this is the question, is there an FP technique that allows
inidividual elements of an array (list, bag, someother roughly
equivalent structure) to be addressed and arbitrary (i.e. cyclic)
graphs to be constructed (and rearranged) on-the-fly, but where
unreferenced elements get collected? If so, where can I learn about
it?

-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Torben Ægidius Mogensen

2007-04-27, 10:03 pm

Chris F Clark <cfc@shell01.TheWorld.com> writes:

> Again, let me preface my comments, by saying I think a functional
> programming language like ML or Haskell is probably a best language
> for a first compiler construction course (and perhaps a best first
> language in general). Having said that, I want to backtrack and argue
> some other points (and in the end ask a question).
>
> Not to pick on this particular post, but it brings up issues I want to
> address, so it's the candidate of choice.
>
> torbenm@app-7.diku.dk (Torben Ægidius Mogensen) writes:
>
>
> You can get polymorphism, by using C++ as your C. You should be able
> to get pattern matching from your compiler-compiler (or other) add-on
> tool. I don't know that you can in many cases, but you should be able
> to get it, and if you can't that's a fault in the tool chain.


These are all true, but I find that "it's not in the language, but
external tools might/do have it" is not a very good defense for a
language.

> That leaves GC. You can get GC by going to Java (or C# or even
> "managed C++"). More importantly, you can get poor-person's GC in a
> variety of means (the Boehm collector) or a variety of coding
> conventions.


True. Andrew Appel uses this appraoch in "Modern Compiler
Implementaion in C". It has a bit of the above flavour, though.

> Here is the point I want to debate/discuss. What C and its semantic
> (if not syntactic) cousins Pascal and PL1/G have is the ability to
> create a modify complex graph structures (with complex vertexes and
> edges) in place (i.e. by side-effect). (This ability exists in all
> languages where one can explicitly manipulate pointers and not just
> lists.) Now this may be a dubious advantage, but it does make certain
> things work well. There are times I would trade GC for this
> advantage. Now, perhaps this just shows my age. Moreover, I don't
> like trading pointers for higher-order-functions.


SML and O'Caml both have updatable pointers, so you can build and
manipulate graph structures to your heart's content. I agree that
Haskell is a bit lacking here, but people find other structures to
use. Chris Okasaki's book shows a variety of efficient fully
functional data structures.

> So, this is the question, is there an FP technique that allows
> inidividual elements of an array (list, bag, someother roughly
> equivalent structure) to be addressed and arbitrary (i.e. cyclic)
> graphs to be constructed (and rearranged) on-the-fly, but where
> unreferenced elements get collected? If so, where can I learn about
> it?


In SML and similar langauges with updatable references, you can build
and manipulate graph nodes and edges directly. If a part of the graph
becomes unreachable from the root set, it is GC'ed. In Haskell, this
gets a bit more tricky (which is why I use SML for compilers).

Torben
Hans-Peter Diettrich

2007-04-27, 10:03 pm

Chris F Clark wrote:

> You can get polymorphism, by using C++ as your C...
> That leaves GC...


I don't know whether polymorphism is a requirement for writing
compilers, some opinions deny just the requirement of OOP.

About GC, it can come in handy, but IMO if you don't care about memory
(leaks...), you don't need it, and if you care, you shouldn't use it ;-)


> Here is the point I want to debate/discuss. What C and its semantic
> (if not syntactic) cousins Pascal and PL1/G have is the ability to
> create a modify complex graph structures (with complex vertexes and
> edges) in place (i.e. by side-effect). (This ability exists in all
> languages where one can explicitly manipulate pointers and not just
> lists.)


I prefer to distinguish between pointers and references, as C++ does.
Pointer arithmetic is dangerous, references are a requirement for OOP,
and are safe unless referenced objects are erroneously destroyed.
Leaving the destruction to GC is not a replacement for proper object
management ;-)

I think that we don't need another discussion of my ;-) opinions.


> So, this is the question, is there an FP technique that allows
> inidividual elements of an array (list, bag, someother roughly
> equivalent structure) to be addressed and arbitrary (i.e. cyclic)
> graphs to be constructed (and rearranged) on-the-fly, but where
> unreferenced elements get collected? If so, where can I learn about
> it?


Such a usable techique just is mark-sweep. Where "traditional" garbage
collection must know about the location of references, in your
index-based model it must know about the location of indices. In all
cases RTTI is a requirement for GC, and type safety at least is desireable.

One variation of GC uses reference counting for the determination, when
an object becomes unused. This model has some problems, e.g. with
circular references, but it can be implemented even without RTTI
support, and can be made work also with cyclic references.

Literature? Sorry, I cannot recommend specific literature. The
documentation of the Boehm GC may contain some helpful information.
Reference counting should be self-explanatory...

DoDi
Lauri Alanko

2007-04-29, 4:13 am

Chris F Clark <cfc@shell01.TheWorld.com> wrote:
> So, this is the question, is there an FP technique that allows
> inidividual elements of an array (list, bag, someother roughly
> equivalent structure) to be addressed and arbitrary (i.e. cyclic)
> graphs to be constructed (and rearranged) on-the-fly, but where
> unreferenced elements get collected? If so, where can I learn about
> it?


All functional programming languages support bona fide mutable graphs,
but if you want a "true FP technique", then you may find the following
paper interesting:

http://www.eecs.harvard.edu/~nr/pub...g-abstract.html

An Applicative Control-Flow Graph Based on Huet's Zipper

Norman Ramsey and João Dias

We are using ML to build a compiler that does low-level
optimization. To support optimizations in classic imperative
style, we built a control-flow graph using mutable pointers
and other mutable state in the nodes. This decision proved
unfortunate: the mutable flow graph was big and complex, and
it led to many bugs. We have replaced it by a smaller,
simpler, applicative flow graph based on Huet's (1997)
zipper. The new flow graph is a success; this paper presents
its design and shows how it leads to a gratifyingly simple
implementation of the dataflow framework developed by
Lerner, Grove, and Chambers (2002).

Their graphs are not cyclic data structures, since a reference from
one basic block to another are represented by a label which must be
looked up from a map, but that doesn't seem to be a great hindrance,
since basic blocks eventually need explicit labels anyway.


Lauri
Jon Harrop

2007-04-29, 4:13 am

Torben Ægidius Mogensen wrote:
> Chris F Clark <cfc@shell01.TheWorld.com> writes:
>
> These are all true, but I find that "it's not in the language, but
> external tools might/do have it" is not a very good defense for a
> language.


Absolutely. If you're teaching compilers, the last thing you need is
the language getting in the way. The ML family of languages allow
compilers to be expressed succinctly and elegantly compared to
C/C++/Java etc.

--
Dr Jon D Harrop, Flying Frog Consultancy
The F#.NET Journal
http://www.ffconsultancy.com/produc...journal/?usenet

Hans-Peter Diettrich

2007-04-29, 4:13 am

Torben Ægidius Mogensen wrote:

> In SML and similar langauges with updatable references, you can build
> and manipulate graph nodes and edges directly. If a part of the graph
> becomes unreachable from the root set, it is GC'ed.


A note, applicable to many languages:

Graph transformations are one reason, why GC requires RTTI. Imagine
that you want to move a node around in the tree. Since it cannot
reside in two places at the same time, you have to remove it first,
and ... bang!?

That's why a mark-sweep collector must know about e.g. parameters and
local variables in subroutines, where a reference to the above node
will prevent it from being collected. When a garbage collector
searches the entire call stack for possible object references, it can
find false references, e.g. locations containing patterns that look
like an object reference. Fortunately that's quite harmless, such a
misinterpretation will never cause an object to be collected. More
annoying were unaligned references, where every byte in the stack
could be the first byte of a reference. Such guessing also disallows
copy-collection, because then a false hit will result in a modifaction
of an unrelated value, when the object is moved during GC.

DoDi
Daniel C. Wang

2007-04-29, 4:13 am

Hans-Peter Diettrich wrote:
{stuff deleted}

> In all cases RTTI is a requirement for GC, and type safety at least
> is desireable.


There are several GC schemes that allow you to get by with nothing
more than a stack map. So individual objects need not have any runtime
type info.

Addamenla4

2007-04-29, 4:53 pm

Cute blonde with REALLY big titties!
http://trully-bigtits.info/play.asp?bigtits218571
Simon Richard Clarkstone

2007-04-29, 10:03 pm

Chris F Clark wrote:
> Here is the point I want to debate/discuss. What C and its semantic
> (if not syntactic) cousins Pascal and PL1/G have is the ability to
> create a modify complex graph structures (with complex vertexes and
> edges) in place (i.e. by side-effect). (This ability exists in all
> languages where one can explicitly manipulate pointers and not just
> lists.) Now this may be a dubious advantage, but it does make certain
> things work well. There are times I would trade GC for this
> advantage. Now, perhaps this just shows my age. Moreover, I don't
> like trading pointers for higher-order-functions.


Have a look into Haskell's Data.Graph.Inductive. The inventor wrote a
paper explaining it.[1] It is a graph library based on the idea that a
graph is either empty or can be decomposed into a node, it's edges, and
another graph. You would usually do flowgraph surgery by decomposing
the flowgraph to extract the node(s) of interest (every node must be
labelled by a distinct Int number, BTW), then re-assemble the graph with
new node(s) inserted. Because you are using structure decomposition,
the bits get GCed just like they would when you decomposed a parse tree.

There are of course functions to check if the graph is disconnected,
though I doubt it would actually become so often enough to worry about.

-----
[1] Classic Haskell phenomenon: libraries that are documented only by
the research papers stemming from their development process.

--
Simon Richard Clarkstone:
s.r.cl?rkst?n?@durham.ac.uk/s?m?n.cl?rkst?n?@hotmail.com
Scheme guy says: (> Scheme Haskell)
Haskell guy says: (> Scheme) Haskell
Sofii

2007-05-05, 2:53 am

http://Celine-Dion-facestanding-mov...hp?movie=148803
Tedup43

2007-05-11, 6:42 am

http://Britney-Spears-jerking.info/...hp?movie=148803
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com