Home > Archive > Compilers > July 2006 > Representing Closures in C
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Representing Closures in C
|
|
| Johan Tibell 2006-07-21, 7:02 pm |
| I'm writing a small lambda calculus interpreter in C. My abstract
syntax tree (AST) representation looks like this:
struct exp;
struct lam {
char *id;
struct exp *exp;
};
struct app {
struct exp *exp1;
struct exp *exp2;
};
enum type {
LAM,
VAR,
APP,
LIT
};
struct exp {
enum type type;
union {
struct lam *lam;
char *var;
struct app *app;
int lit;
} form;
};
Now I'm about to write my eval function which looks something like
this:
int eval(struct exp *exp)
{
switch(exp->type) {
case LIT:
return exp->form.lit;
break;
/* more cases */
}
}
When evaluating a function application I need to somehow bind the value
of the evaluated argument to the variable in the lambda abstraction.
This example will hopefully explain what I need to achieve:
((\x. x) 1)
When evaluating this call to the identity function the whole expression
should evaluate to one. I need a way to bind the value 1 to the
variable x so x can later be looked up in the body of the lambda
abstraction.
I'm aware that I have more (and probably more difficult) problems
ahead such as dealing with tail recursion, etc. but it would be nice if
I could get something up and running before improving upon it.
I imaging that I'll need:
* a way of representing the environment in which an expression should
be evaluated
* a way of representing closures
Perhaps that's the same thing?
Any good articles or papers out there? What I'm looking for is a
hands-on description of how it can be done in C.
[There's certainly been a lot of work on this. Seems to me you basically
want to carry a symbol table along with each expression. -John]
| |
| Hans Aberg 2006-07-21, 7:02 pm |
| "Johan Tibell" <johan.tibell@gmail.com> wrote:
> I'm writing a small lambda calculus interpreter in C.
> Any good articles or papers out there? What I'm looking for is a
> hands-on description of how it can be done in C.
Simon Peyton Jones has written two books on the implementation of
functional languages, both with full text available online:
http://research.microsoft.com/~simo...ers/papers.html
The book by Abelson & Sussman, "Structure..." has some implementation
stuff though written in Scheme. You might want to check out say the
implementation of Hugs <http://haskell.org/hugs/>, though this is an
interpreter of a typed, lazy functional language: Haskell.
--
Hans Aberg
| |
| Tommy Thorn 2006-07-21, 7:02 pm |
| Johan Tibell wrote:
> I'm writing a small lambda calculus interpreter in C. My abstract
> syntax tree (AST) representation looks like this:
>
> struct exp;
>
> struct lam {
> char *id;
> struct exp *exp;
> };
.... snip
> int eval(struct exp *exp) {
> switch(exp->type) {
> case LIT:
> return exp->form.lit;
> break;
....
> [There's certainly been a lot of work on this. Seems to me you basically
> want to carry a symbol table along with each expression. -John]
Do yourself a favor and read Simon Peyton Jones' "The Implementation
of Functional Programming Languages" (available for free here:
http://research.microsoft.com/~simo...-1987/index.htm)
or even more to the point, Implementing functional languages: a
tutorial
(http://research.microsoft.com/Users.../pj-lester-book).
There's a range of choices in the implementation. John's suggestion
is one option. Another one is to implement beta substitutions
literally (called "template expansion"); that is, to evaluate
((\x.e1) e2)
by deep copying(*) e1, replacing instances of variable x by the
argument (e1[e2/x]). That will work in the framework you sketched
above.
Pretty much all the remaining alternatives require some amount of
transformation on the original expression (~ compilation). Fx.
eliminating the dynamic variable lookup in the scheme suggested by
John, you'd replace variables with their de Bruin index and use it to
access the value of that variable in the closure.
Have fun,
Tommy
(*) Be careful to handle names correctly, eg. ((\x.\x.e1) e2) = \x.e1,
not \x.(e1[e2/x])
| |
| Johan Tibell 2006-07-22, 7:01 pm |
| > > [There's certainly been a lot of work on this. Seems to me you basically
I threw together a crude and probably incorrect interpreter tonight. I
store bound variables in a linked list that's carried along with the
closure.
struct env {
char *id;
struct value val;
struct env *next;
};
struct closure {
struct env *env; /* free variables */
struct exp *exp; /* body to be evaluated */
char *id; /* the variable to bind when evaluated */
};
This is probably not very efficient (if you know of a better way
please tell me) but it also allows expressions to share the "tail" of
the environment. (Imagine a tree turned upside down.) It might be
better to store top level functions (which I have not implemented yet)
in a hashtable so they can be looked up more efficiently.
The source code is available here, it is quite short.
http://www.itstud.chalmers.se/~larssont/interpreter/
It is neither garbage collected nor tail recursive at the moment.
Tommy Thorn wrote:[color=darkred]
> Do yourself a favor and read Simon Peyton Jones' "The Implementation
> of Functional Programming Languages" (available for free here:
> http://research.microsoft.com/~simo...-1987/index.htm)
> or even more to the point, Implementing functional languages: a
> tutorial
> (http://research.microsoft.com/Users.../pj-lester-book).
I've read parts of Simon Peyton Jones's book but I'm unsure about how
much overlap there is between writing an interpreter for non-strict and
strict languages. In particular, do I need graph reduction?
> There's a range of choices in the implementation. John's suggestion
> is one option. Another one is to implement beta substitutions
> literally (called "template expansion"); that is, to evaluate
>
> ((\x.e1) e2)
>
> by deep copying(*) e1, replacing instances of variable x by the
> argument (e1[e2/x]). That will work in the framework you sketched
> above.
Interesting approach, I've never actually thought about performing
beta-substitution in that way.
> Pretty much all the remaining alternatives require some amount of
> transformation on the original expression (~ compilation). Fx.
> eliminating the dynamic variable lookup in the scheme suggested by
> John, you'd replace variables with their de Bruin index and use it to
> access the value of that variable in the closure.
I did quick search for "de Bruin index" but it came back empty. What is
a "de Bruin index"? Also, I'm not opposed to doing some
transformations, I'll probably do both closure conversion and CPS
transformation as suggested here:
http://www.iro.umontreal.ca/%7Ebouc...minutes-en.html
> Have fun,
> Tommy
Thanks.
> (*) Be careful to handle names correctly, eg. ((\x.\x.e1) e2) = \x.e1,
> not \x.(e1[e2/x])
I think my interpreter handles this correctly. Is this a problem with
both strict and non-strict interpreters?
| |
| SM Ryan 2006-07-23, 7:01 pm |
| "Johan Tibell" <johan.tibell@gmail.com> wrote:
# > > [There's certainly been a lot of work on this. Seems to me you basically
# > > want to carry a symbol table along with each expression. -John]
#
# I threw together a crude and probably incorrect interpreter tonight. I
# store bound variables in a linked list that's carried along with the
# closure.
A closure is a dynamic stack of scopes; a scope is a mapping (or
dictionary) from identifiers to values. There are many ways to
implement a scope mapping: linked list of identifiers, self organizing
list, sorted tree, hash table, etc. If you don't need a sorted
traversal, you can get a hash table implementation out of a library on
many systems.
The stacked scopes is just the innermost scope defining an identifier
is used. You can maintain an actual stack of scopes; or you can have a
scope be a dictionary of an identifier to a stack of values, or you
can smash the scope stack. One advantage of a stack of scopes is ease
of sharing outer scopes amongst closures: you then end up with a tree
of scopes, each closure being a path from a leaf to the root. This
also means you can safely use reference counts to decide when a scope
is garbage.
The really big issue is when the identifier value is a variable on the
procedure stack, because procedure stack frames are automatically
recoverred on procedure exit of most languages unless you change the
language implementation. This doesn't occur in applicative language
because the identifier value is never a variable. Algol-68 could have
supported closures because you can use heap variables instead of local
variables.
--
SM Ryan http://www.rawbw.com/~wyrmwif/
What kind of convenience store do you run here?
| |
| Hans Aberg 2006-07-23, 7:01 pm |
| <johan.tibell@gmail.com> wrote:
> I did quick search for "de Bruin index" but it came back empty. What is
> a "de Bruin index"?
One replaces the variable names in a lambda-expressions with a
non-negative integer: in each occurrence, the number of intermediate
lambda-heads between the lambda-head that the variable is bound to,
and the variable_occurrence itself. The point is that variable
substitution needs no variable relabeling.
However an elegant method, I am told that it hard to use in debugging,
as it is difficult for humans to interpret. So in actual functional
language implementations, it has not been used so much.
--
Hans Aberg
| |
| Tommy Thorn 2006-07-25, 8:01 am |
| Johan Tibell wrote:
> I threw together a crude and probably incorrect interpreter tonight.
....
> This is probably not very efficient (if you know of a better way
> please tell me) but it also allows expressions to share the "tail" of
> the environment.
Well, that really depends on how much work you're willing to do.
Without some degree of "compilation" variants of this or explicit
rewriting (as I suggested) are about the only solutions.
Beware though, that a naive dictionary approach suffers from space
leak! (Imagine fx. a pair of mutally tail-recursive functions that
continues to grow the argument list). You need to [occationally] prune
duplicates out of the dictionary to avoid this.
> It might be
> better to store top level functions (which I have not implemented yet)
> in a hashtable so they can be looked up more efficiently.
IMO that would be like supercharging a Geo Metro (or a Trabant).
You're better off expensing the energy on a fundamentally better model.
Check out Xavier Leroy's Zinc model
http://citeseer.ist.psu.edu/leroy90zinc.html (Oh, you may want to know
the classics texts also, John C. Reynolds "Definitional interpreters
for higher-order programming languages",
http://portal.acm.org/citation.cfm?...CFTOKEN=6730910
but Zinc is a more efficient model)
> The source code is available here, it is quite short.
> http://www.itstud.chalmers.se/~larssont/interpreter/
Your APP fails to verify that it's actually applying a function.
> I've read parts of Simon Peyton Jones's book but I'm unsure about how
> much overlap there is between writing an interpreter for non-strict and
> strict languages. In particular, do I need graph reduction?
There is an *IMMENSE* overlap between the implementation of strict and
lazy functionally languages, though highly optimized implementation
have different trade-offs naturally. This was not obvious at all in
the beginning, with the former coming from an environment model and the
latter from graph reduction/super combinators. At the end of the day
however, the best implementations used the exact same ideas and
concepts. The difference, the "reduction" in "graph reduction" is the
updating of the closure, which is not done for strict languages.
IMO, Peyton Jones' books are outstandingly readable, and appropriate
reading even if you're ultimately more interested in strict languages.
> Interesting approach, I've never actually thought about performing
> beta-substitution in that way.
It is actually the most obvious implemention when you're coming from a
graph view. It inefficient sure, but not as bad as you might think if
your allocation is fast (as in incrementing a pointer).
>
> I think my interpreter handles this correctly. Is this a problem with
> both strict and non-strict interpreters?
Yes (this problem is known as "name capture").
Tommy
| |
| Tommy Thorn 2006-07-25, 8:01 am |
| Hans Aberg wrote:
> However an elegant method, I am told that it hard to use in debugging,
> as it is difficult for humans to interpret. So in actual functional
> language implementations, it has not been used so much.
This is really no different than the problem of tracking source level
variables in the optimized assembly. A tedious task for humans, but
that's what compilers use symbol debugging entries (eg. DWARF2) for.
If it sees little application in practice it may have more to do with
the fact that it's still only a step on the road to an efficient
implementation.
Tommy
| |
| Hans Aberg 2006-07-25, 7:02 pm |
| <tommy.thorn@gmail.com> wrote:
to use in debugging,[color=darkred]
>
> This is really no different than the problem of tracking source level
> variables in the optimized assembly. A tedious task for humans, but
> that's what compilers use symbol debugging entries (eg. DWARF2) for.
To begin with, I am just quoting some Hugs/Haskell developers. I am
not saying not to use it. But I would think that implementing a
correct functional language is considerably more difficult than
working with assembly code, simply because one works on a higher
mathematical structural level with less direct intuition.
> If it sees little application in practice it may have more to do
> with the fact that it's still only a step on the road to an
> efficient implementation.
The best way to find out, I gather, is to try it. If debugging becomes
difficult, it comes to my mind, one might perhaps try a hybrid, where
the computer uses de_Bruijn indexing, and variable names are somehow
retained for the user.
If you go beyond a simple functional language, towards theorem provers
which binds variables in several types of ways, then definitely, getting
it correct is a major hurdle. Then, humans must be able to efficiently
read the formulas, and verify them by hand, otherwise efficient debugging
cannot expect to take place.
--
Hans Aberg
|
|
|
|
|