For Programmers: Free Programming Magazines  


Home > Archive > Compilers > June 2007 > A handful of LISP questions









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author A handful of LISP questions
tactics

2007-06-18, 10:11 pm

Hello all~

Recently, I have become interested writing a personal dialect of LISP
in C. Over the last month, I actually managed to get fairly stable
version working and I wrote a small word-guessing game in it and some
common recursive functions (eg, I got it to print the primes between 2
and 100). It was really fun to do, and I want to take it a step
further. However, I have some questions and advice to ask here.

My first question, and the most important to me for now, I think, is
what is the best way to write a parser for a LISP? It was only out of
the grace of the C Gods that I got my current parser working. I have a
nice method for breaking a raw string into tokens (which I was cute
about, and instead of returning an array of tokens, it returns a
cons'ed list of C structure LISP-objects). But once I have the tokens,
I do some pretty bad black magic to get the final list structure. It's
something like, whenever I come to a parenthesis, I search for the
close, split the string, and then parse the stuff inside the paren and
then the stuff outside the paren. It's really bad!

I was wondering from a theoretical standpoint, what kind of grammar is
a LISP grammar? Obviously, it's a straightforward CFG, but is it
LL(1)? I took a class in Programming Languages in college, but nothing
more than how to write a calculator in JavaCC. The web resources for
LL and LR parsing methods are quiet pathetic too (someone needs to
clean up their LL algorithm explanation really badly). Can anyone
point me in the right direction here? What would be a good technique
to look into for this job?

My next question is about garbage collection. I have a nice mark and
sweep algorithm written, but the big issue here is I don't know when
to actually DO garbage collection. Right now, I just call it at the
end of my program, but that is hardly useful. Do I just run it
periodically in a separate thread? Do I set a recurring signal timer?
In either case, what is the best way to ensure I still have access to
the current environment?

If I think of any more questions, I'll post them here. I hope someone
can give me some good advice on these topics.
Gene

2007-06-19, 10:10 pm

On Jun 18, 8:29 pm, tactics <tactic...@gmail.com> wrote:
> Hello all~
>
> Recently, I have become interested writing a personal dialect of LISP
> in C. Over the last month, I actually managed to get fairly stable
> version working and I wrote a small word-guessing game in it and some
> common recursive functions (eg, I got it to print the primes between 2
> and 100). It was really fun to do, and I want to take it a step
> further. However, I have some questions and advice to ask here.


Yes. It's a great exercise.

> My first question, and the most important to me for now, I think, is
> what is the best way to write a parser for a LISP? It was only out of
> the grace of the C Gods that I got my current parser working. I have a
> nice method for breaking a raw string into tokens (which I was cute
> about, and instead of returning an array of tokens, it returns a
> cons'ed list of C structure LISP-objects).


For generality you're better off pulling tokens from the start of a
stream. The reader can need to handle very big data in a serious lisp
implementaiton.

> But once I have the tokens,
> I do some pretty bad black magic to get the final list structure. It's
> something like, whenever I come to a parenthesis, I search for the
> close, split the string, and then parse the stuff inside the paren and
> then the stuff outside the paren. It's really bad!


> I was wondering from a theoretical standpoint, what kind of grammar is
> a LISP grammar? Obviously, it's a straightforward CFG, but is it
> LL(1)? I took a class in Programming Languages in college, but nothing
> more than how to write a calculator in JavaCC. The web resources for
> LL and LR parsing methods are quiet pathetic too (someone needs to
> clean up their LL algorithm explanation really badly). Can anyone
> point me in the right direction here? What would be a good technique
> to look into for this job?


You might enjoy looking at xlisp. http://almy.us/xlisp.html . It's
pretty much what you are trying to do.
You can learn something about compiling a small lisp with
http://hedgehog.oliotalo.fi/

Lisp grammar is trivially LL(1). Every lisp reader I've ever seen
parses by recursive descent. The lexer returns atoms, parens, and
dots. The parser is something like this:

function read
next_token = lex;
if next_token = '(' then
return read_list
else if next_token = ')' then
return ')' ;; only "legal" when returned to read_list
else // atom
return next_token;
end read

function read_list
list = tail = nil
loop
next_atom = read;
if next_atom = ')' return list
if next_atom = '.'
tail->cdr = read;
if read /= ')' error "malformed dotted list"
return list;
end if
;; add new item to tail of list
if tail = nil
list = tail = cons(next_atom, nil);
else
tail->cdr = cons(next_atom, nil)
tail = tail->cdr
end if
end loop;
end read_list

In some lisps, the "read_list" above is implemented as a macro
character function for the '(' character.

>
> My next question is about garbage collection. I have a nice mark and
> sweep algorithm written, but the big issue here is I don't know when
> to actually DO garbage collection. Right now, I just call it at the
> end of my program, but that is hardly useful. Do I just run it
> periodically in a separate thread? Do I set a recurring signal timer?
> In either case, what is the best way to ensure I still have access to
> the current environment?


The usual idea is to allocate an "arena", allocate from that until
it's full, thebn garbage collect. If the arena is still full after
collection, increase its size.
Nils M Holm

2007-06-19, 10:10 pm

On 2007-06-19, tactics <tactics40@gmail.com> wrote:
> My first question, and the most important to me for now, I think, is
> what is the best way to write a parser for a LISP? It was only out of
> the grace of the C Gods that I got my current parser working. I have a
> nice method for breaking a raw string into tokens (which I was cute
> about, and instead of returning an array of tokens, it returns a
> cons'ed list of C structure LISP-objects). But once I have the tokens,
> I do some pretty bad black magic to get the final list structure. [...]


Most LISPs do not have a full-blown parser, but a so-called reader,
which is typically implemented in the READ procedure. The same
procedure that reads S-expressions is also used to read complete LISP
programs.

Converting text to tokens seems to be a redundant step to me. Once
you have read the text of a token, you know what it is, and you can
generate the internal representation in an instant. LISP variables
have no types, so when you read a symbol, you generate a symbol.
When you read a number, you generate a number. To read the members
of a list you call READ recursively.

> I was wondering from a theoretical standpoint, what kind of grammar is
> a LISP grammar?


sexpr := atom
| list

list := '(' members ')'
| '(' ')'

members := sexpr
| sexpr members

atom := symbol
| number
| string
... add more atoms to your taste ...

> Obviously, it's a straightforward CFG, but is it LL(1)?


When you refactor 'members' and 'list', the grammar is LL(1). In fact,
LISP is so easy to read (by a reader procedure) that I have never
bothered to write a lexer or a parser for it. For example, a READ
procedure for Scheme may be implemented in a few hundred lines of
C. See [1] for some actual code.

> My next question is about garbage collection. I have a nice mark and
> sweep algorithm written, but the big issue here is I don't know when
> to actually DO garbage collection. Right now, I just call it at the
> end of my program, but that is hardly useful. Do I just run it
> periodically in a separate thread? Do I set a recurring signal timer?
> In either case, what is the best way to ensure I still have access to
> the current environment?


Mark and sweep is a good choice for a simple LISP. Just make sure
that you use a constant-space version, so that it can deal with long
or deep structures gracefully.

A naive approach would be to run the GC when you run out of memory.
A much better idea would be to run it when your memory pool runs
low (but not dry). This is because when you wait until you are out
of memory, collections occur at a quickly increasing rate, which
degrades performance dramatically.

Concurrent GC is only useful if you do not want the collector to
interrupt program execution. See [2] for an extensive coverage of GC
algorithms.

[1] http://t3x.org/bits/s9fes/
[2] Jones, Lins
"Garbage Collection"
Wiley & Sons, 1996

--
Nils M Holm <nmh@t3x.org> -- http://t3x.org/nmh/
[Discussions of garbage collection would be more appropriate on gclist.
See http://www.iecc.com/gclist/ -John]

Martin Rodgers

2007-06-19, 10:10 pm

tactics wrote:

> If I think of any more questions, I'll post them here. I hope someone
> can give me some good advice on these topics.


Firstly, the Scheme FAQ entry [1-4] Where can I learn about implementing
Scheme interpreters and compilers?
http://www.faqs.org/faqs/scheme-faq.../section-4.html

If you don't mind reading a little Haskell code, there's a tutorial on
writing a Scheme interpreter.
http://halogen.note.amherst.edu/~jd...l/overview.html

The parser section may be of particular interest to you.
http://halogen.note.amherst.edu/~jd...ial/parser.html

You can also learn a lot by studying a good, clean implementation. I
recommend SCM, as its written in C and isn't so heavily optimised that
the code obscures the basic concepts.
http://www-swiss.ai.mit.edu/~jaffer/SCM.html

Here's a pretty complete Scheme parser written in Perl.
http://search.cpan.org/~sfink/parro...cheme/Parser.pm
--
Martin Rodgers http://www.wildcard.demon.co.uk

Lieven Marchand

2007-06-19, 10:10 pm

tactics <tactics40@gmail.com> writes:

> I was wondering from a theoretical standpoint, what kind of grammar is
> a LISP grammar? Obviously, it's a straightforward CFG, but is it
> LL(1)? I took a class in Programming Languages in college, but nothing
> more than how to write a calculator in JavaCC. The web resources for
> LL and LR parsing methods are quiet pathetic too (someone needs to
> clean up their LL algorithm explanation really badly). Can anyone
> point me in the right direction here? What would be a good technique
> to look into for this job?


Traditionally, in Lisp (the modern language is commonly written in
lower case) the parser is called the reader and it is part of the
facilities that are available to the programmer. The Lisp toplevel is
often called the REPL (Read-Eval-Print-Loop).

In Lisp, everything is either an atom or a list. So the Lisp grammar,
to a first approximation, is

sexp:= '(' (atom|sexp)* ')'

where it's the lexers job to recognize the syntax for the various
atoms such as symbols (FOO, COMMON-LISP:LENGTH), numbers (12, -5,
3.14e0), characters (#\a, #\Newline) etc.

It's simple enough not to need LR or even LL.

A complication is the mechanism of read-time macros. For instance, the
QUOTE facility, where you can stop evaluation of an item by quoting
it, like '(this is a literal list) is implemented in Common Lisp by
making ' a read-time macro that expand to the form (QUOTE (THIS IS A
LITERAL LIST)) where QUOTE is a special form that doesn't evaluate its
argument.

> My next question is about garbage collection. I have a nice mark and
> sweep algorithm written, but the big issue here is I don't know when
> to actually DO garbage collection. Right now, I just call it at the
> end of my program, but that is hardly useful. Do I just run it
> periodically in a separate thread? Do I set a recurring signal timer?


Generally, you can do garbage collection when an allocation fails or
when your program has used more than a predetermined amount of
memory. In a first try, avoid concurrent garbage collection because it
is fairly difficult to write the rest of your lisp system in such a
way that the garbage collecting thread always sees a consistent state
of the memory.

> In either case, what is the best way to ensure I still have access to
> the current environment?


What do you mean by this?

Andreas Hinze

2007-06-19, 10:10 pm

Hi

Do you want to implement a Lisp-1 (like Scheme) or Lisp-2 (like Common
Lisp) ?

However, if you seriously want to write a lisp interpreter then you
should have a look into

"Lisp in small pieces" by Queinnec & Callaway

which is a special book about implementation of Lisp interpreters &
compilers).

Also the book from Jones & Lins about garbage collection that was
mentioned in another answer will help you at this topic.

Simple Lisp interpreter/compiler can be found i.e. in "Paradigms of
Artificial Intelligence Programming: Case Studies in Common LISP" by
Peter Norvig or "The ANSI Common Lisp Book" by Paul Graham.

Moreover there is the newsgroup comp.lang.lisp. There you will find
i.e. the maintainers of CLISP, CMUCL, SBCL and some comercial lisp
implementors. I assume that in comp.lang.scheme you can meet the
implementators of the various scheme interpreters.

Hth
Andreas
tactics

2007-06-19, 10:10 pm

> Yes. It's a great exercise.

It sure is! I started off basically copy and pasting Paul Graham's
LISP and changing it into C code. And from there, I was able to
implement first- class functions (dynamic scope though... still need
to figure out lexical from C). I finally got to a critical point a
few ws ago where I was just able to add feature after feature and
see the results of my project. But I don't like working on messy code,
and so the last w and a half, I have avoided doing anything
further.

> You might enjoy looking at xlisp. http://almy.us/xlisp.html .
> You can learn something about compiling a small lisp with


http://hedgehog.oliotalo.fi/


> [1] http://t3x.org/bits/s9fes/

[2] Jones, Lins
"Garbage Collection"
Wiley & Sons, 1996


> http://www.faqs.org/faqs/scheme-faq.../section-4.html
> http://halogen.note.amherst.edu/~jd...l/overview.html
> http://halogen.note.amherst.edu/~jd...ial/parser.html
> http://www-swiss.ai.mit.edu/~jaffer/SCM.html
> http://search.cpan.org/~sfink/parro...cheme/Parser.pm




> A complication is the mechanism of read-time macros. For instance, the

QUOTE facility.

I have the ' expanding properly to quote (thank god). However, the
last thing I added was support for quasiquotes and unquotes. I don't
have the ` and , operators working, and that is the big motivation for
me to rewrite my parser. That and because of dotted pairs (which I
don't yet support).

Additionally, my goal is to make a more usable LISP. I come from a
Python background, and I would like to make a language with very
clean, elegant syntax. I like what ' ` and , do for LISP, and I want
to add more of it where it is appropriate. And because of that, I need
a very clean way to make changes to the language's grammar.


[color=darkred]
>What do you mean by this?


Let me rephrase. When garbage collecting with Mark and Sweep, you need
a root set of objects. This is the set of objects in immediate view of
the program. For example, the objects on the call stack or in the
current environment. I guess from what people have told me on this
list is that you call the collector at the start of the eval function
when memory exceeds a threshold. Then, in that case, do I just flag
everything in the current environment as being part of the root set?


> Do you want to implement a Lisp-1 (like Scheme) or Lisp-2 (like Common

Lisp) ?

Oh god, lisp-1 please. I am not friends with Common Lisp. It reminds
me of perl with parentheses =-) Again, I'm looking for elegance, so
I'm going to have a strong biased towards the Scheme ends of things.

> However, if you seriously want to write a lisp interpreter then you

should have a look into "Lisp in small pieces" by Queinnec & Callaway

I actually purchased this book a w and a half ago. It is very
interesting. It is a bit tricky at some parts though, and requires
you to have a fair understand the concepts in question. The part about
continuations completely blew me away (as I had never used call/cc
before). But I'll keep at it, even if it takes me a while.

Thank you everyone who replied. You have all been very helpful. I hope
to make good progress with this over the next few months... or
whenever I have time =-)
Gene

2007-06-20, 10:09 pm

> Let me rephrase. When garbage collecting with Mark and Sweep, you
> need a root set of objects.


Indeed! With any kind of garbage collection.

> This is the set of objects in immediate view of
> the program.


The collection of objects that might be accessed by the program at
some time in the future form a (usually disconnected) graph with
objects as nodes and pointers as edges. The root set is some set of
pointers from which each node in the graph can be reached.

> For example, the objects on the call stack or in the
> current environment.


> I guess from what people have told me on this
> list is that you call the collector at the start of the eval function
> when memory exceeds a threshold. Then, in that case, do I just flag
> everything in the current environment as being part of the root set?


Where you must look for roots depends on your implementation and
environment. It can be tricky and indeed it can depend on where you
call the collector. To your list you'll have to add implementation
language variables that may hold pointers. This is likely to include
processor registers. Depending on how you define success, just
calling gc at the start of eval has a certain lack of appeal. You can
always invent scenarios where the lisp must abort because memory is
exhausted, when a gc would allow the program to continue. Therefore
you'd like a design where gc can be called inside the allocator "just
in time" to prevent an ugly death.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com