Home > Archive > Prolog > February 2006 > looking for a technique to reduce stack usage
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
looking for a technique to reduce stack usage
|
|
| mneuhaber22@berlin.com 2006-02-02, 7:04 pm |
| Hello
Just ordered O'Keefe's book on the craft of Porlog. While I am waiting,
can somebody here suggest a technique to reduce stack usage in the
following program?
With SWI prolog I get the following statistics
------------- stack usage for size 14 -----------------
?- move_disks('A','B','C',14),statistics.
Heap : 2,083,664 Bytes
Local stack : 2,048,000 1,679,360 1,671,828 Bytes
Global stack : 4,096,000 102,400 100,192 Bytes
Trail stack : 4,096,000 36,864 33,436 Bytes
3 garbage collections gained 558,532 bytes in 0.02 seconds.
1 threads, 0 finished threads used 0.00 seconds.
----------------------------------------------------------------
For size 15 the local stack is overflown.
Surely this program can be rewritten in a way that doesn't consume this
much memory. I am looking for a technique that can be generalized to
deal with other not so trivial problems. To me the challenge always is
to restate the problem in a way that addresses resource consumption.
--- hanoi towers implementation -----
% towers of hanoi, 3 pegs
% source A, destination B, intermediate C, number N
% vars must be instantiated
move_disks(A, B, _, 1) :- write('move top from '),
write(A), write(' to '), writeln(B).
% move from A to B via C
move_disks(A, B, C, N) :- N>1, N1 is N-1,
move_disks(A,C,B,N1),
move_disks(A,B,_,1),
move_disks(C,B,A,N1).
-------- end -------------------------------------
| |
| Markus Triska 2006-02-02, 7:04 pm |
| Hi!
mneuhaber22@berlin.com wrote:
>
> For size 15 the local stack is overflown.
> Surely this program can be rewritten in a way that doesn't consume this
> much memory.
Make the predicate tail recursive to avoid using so much local stack space:
move_disks1([]).
move_disks1([m(A,B,C,N)|Rest]) :-
( N =:= 1 ->
format("move top from: ~w to ~w\n", [A,B]),
move_disks1(Rest)
;
N1 is N - 1,
move_disks1([m(A,C,B,N1),m(A,B,_,1),m(C,
B,A,N1)|Rest])
).
doit1(N) :-
move_disks1([m(a,b,c,N)]),
statistics.
?- doit1(14).
Heap : 398,240 Bytes
Local stack : 2,048,000 49,152 748 Bytes
Global stack : 4,096,000 65,536 51,160 Bytes
Trail stack : 4,096,000 16,384 324 Bytes
All the best,
Markus.
| |
| mneuhaber22@berlin.com 2006-02-02, 7:04 pm |
| Markus,
thank you for this enlightening rewrite. I tested it and it works
great. Is there any way to measure the max length of the list that
holds the solutions above. In other words, you are using constant
amount of stack but does it come at the cost of a large list allocated
on the heap? Where is the technique you used explained in more detail
(textbook, web link)?
| |
| Markus Triska 2006-02-02, 7:04 pm |
| Hi!
mneuhaber22@berlin.com wrote:
> thank you for this enlightening rewrite. I tested it and it works
> great. Is there any way to measure the max length of the list that
> holds the solutions above.
There is no such list holding the solutions - I only reified the
call-stack: Instead of implicit allocation of environment frames on the
local stack, I (more or less) explicitly allocate corresponding terms on
the global stack. Garbage collection then corresponds to deallocation of
frames.
> In other words, you are using constant
> amount of [local] stack but does it come at the cost of a large list allocated
> on the heap?
Yeah, the fine-print being that the hand-crafted list, albeit large,
apparently still consumes less memory (on the global stack) than the
environment frames would in its place (on the local stack).
> Where is the technique you used explained in more detail
> (textbook, web link)?
"Modern Compiler Implementation in ML" by Andrew W. Appel discusses it
in the context of garbage collection, where a depth-first search is
implemented using an explicit stack to allocate only N words instead of
N stack frames.
All the best,
Markus.
| |
| russell kym horsell 2006-02-02, 9:56 pm |
| mneuhaber22@berlin.com wrote:
} Hello
} Just ordered O'Keefe's book on the craft of Porlog. While I am waiting,
} can somebody here suggest a technique to reduce stack usage in the
} following program?
} With SWI prolog I get the following statistics
} ------------- stack usage for size 14 -----------------
} ?- move_disks('A','B','C',14),statistics.
} Heap : 2,083,664 Bytes
} Local stack : 2,048,000 1,679,360 1,671,828 Bytes
} Global stack : 4,096,000 102,400 100,192 Bytes
} Trail stack : 4,096,000 36,864 33,436 Bytes
} 3 garbage collections gained 558,532 bytes in 0.02 seconds.
} 1 threads, 0 finished threads used 0.00 seconds.
} ----------------------------------------------------------------
} For size 15 the local stack is overflown.
} Surely this program can be rewritten in a way that doesn't consume this
} much memory. I am looking for a technique that can be generalized to
} deal with other not so trivial problems. To me the challenge always is
} to restate the problem in a way that addresses resource consumption.
.....
Sure, the TOH can be re-written to be purely iterative. It's a good
exercise. :)
| |
| student 2006-02-08, 9:29 am |
| mneuhaber22@berlin.com wrote:
> Hello
>
> Just ordered O'Keefe's book on the craft of Porlog. While I am waiting,
> can somebody here suggest a technique to reduce stack usage in the
> following program?
>
> With SWI prolog I get the following statistics
> ------------- stack usage for size 14 -----------------
> ?- move_disks('A','B','C',14),statistics.
> Heap : 2,083,664 Bytes
> Local stack : 2,048,000 1,679,360 1,671,828 Bytes
> Global stack : 4,096,000 102,400 100,192 Bytes
> Trail stack : 4,096,000 36,864 33,436 Bytes
>
> 3 garbage collections gained 558,532 bytes in 0.02 seconds.
> 1 threads, 0 finished threads used 0.00 seconds.
> ----------------------------------------------------------------
>
> For size 15 the local stack is overflown.
> Surely this program can be rewritten in a way that doesn't consume this
> much memory. I am looking for a technique that can be generalized to
> deal with other not so trivial problems. To me the challenge always is
> to restate the problem in a way that addresses resource consumption.
>
> --- hanoi towers implementation -----
> % towers of hanoi, 3 pegs
>
> % source A, destination B, intermediate C, number N
> % vars must be instantiated
> move_disks(A, B, _, 1) :- write('move top from '),
> write(A), write(' to '), writeln(B).
> % move from A to B via C
> move_disks(A, B, C, N) :- N>1, N1 is N-1,
> move_disks(A,C,B,N1),
> move_disks(A,B,_,1),
> move_disks(C,B,A,N1).
> -------- end -------------------------------------
>
My guess is that because the Prolog inference engine has no way of
"knowing"
that these two rules are mutually exclusive, it has to assume they are not,
which causes it to allocate space for what I believe are called "choice
points" that
are in fact not necessary.
I am not sure exactly what the output from 'statistics/0' means, but
the following, revised version of your code seems reasonably
well-behaved to me
move_disks(A, B, _, 1) :-
!,
write('move top from '),
write(A), write(' to '), writeln(B).
% move from A to B via C
move_disks(A, B, C, N) :-
N>1,
N1 is N-1,
move_disks(A,C,B,N1),
move_disks(A,B,_,1),
!,
move_disks(C,B,A,N1).1.59 seconds cpu time for 305,311 inferences
given the goal
?- move_disks('A','B','C',15),statistics.
2,278 atoms, 1,456 functors, 1,560 predicates, 24 modules, 35,419 VM-codes
Limit Allocated In use
Heap : 503,888 Bytes
Local stack : 2,048,000 16,384 748 Bytes
Global stack : 4,096,000 212,992 197,668 Bytes
Trail stack : 4,096,000 81,920 65,944 Bytes
1 threads, 0 finished threads used 0.00 seconds.
billh
| |
| Bart Demoen 2006-02-08, 9:29 am |
| student wrote:
> My guess is that because the Prolog inference engine has no way of
> "knowing"
> that these two rules are mutually exclusive, it has to assume they are not,
> which causes it to allocate space for what I believe are called "choice
> points" that
> are in fact not necessary.
>
> I am not sure exactly what the output from 'statistics/0' means, but
> the following, revised version of your code seems reasonably
> well-behaved to me
>
> move_disks(A, B, _, 1) :-
> !,
> write('move top from '),
> write(A), write(' to '), writeln(B).
>
> % move from A to B via C
> move_disks(A, B, C, N) :-
> N>1,
> N1 is N-1,
> move_disks(A,C,B,N1),
> move_disks(A,B,_,1),
> !,
> move_disks(C,B,A,N1).1.59 seconds cpu time for 305,311 inferences
Ah, finally someone who knows where "the clapper hangs in the clock"
(flemmish expression).
The excessive stack space usage is indeed due to the non-recognised
determinism.
The extra cut in the first clause solves that problem. Transforming it
to if-then-else has the same effect and the non-tail-recusion removal
(as proposed in other contributions) has almost no effect: it is easy to
check that (with the cut in the first clause) the recursion depth at any
point does not exceed the N in the original query - so tail recursion
optimisation should be enough - Bill's statistics figures don't really
show that (I think) because it gives statistics AFTER all computation is
finished: what one would need is a high-water mark.
I would however argue strongly against the cut in the second clause: it
is totally unnecessary.
Cheers
Bart Demoen
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Markus Triska wrote:
> Make the predicate tail recursive to avoid using so much local stack space:
>
> move_disks1([]).
> move_disks1([m(A,B,C,N)|Rest]) :-
> ( N =:= 1 ->
> format("move top from: ~w to ~w\n", [A,B]),
> move_disks1(Rest)
> ;
> N1 is N - 1,
> move_disks1([m(A,C,B,N1),m(A,B,_,1),m(C,
B,A,N1)|Rest])
> ).
>
> doit1(N) :-
> move_disks1([m(a,b,c,N)]),
> statistics.
>
>
> ?- doit1(14).
>
> Heap : 398,240 Bytes
> Local stack : 2,048,000 49,152 748 Bytes
> Global stack : 4,096,000 65,536 51,160 Bytes
> Trail stack : 4,096,000 16,384 324 Bytes
You gained local stack by introducing if-then-else: the tail-recursion
doesn't help in this case.
Cheers
Bart Demoen
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Markus Triska wrote:
>
>
> "Modern Compiler Implementation in ML" by Andrew W. Appel discusses it
> in the context of garbage collection, where a depth-first search is
> implemented using an explicit stack to allocate only N words instead of
> N stack frames.
For turning any predicate into a tail recursive one, you can look in
that compiler book (and not find anything about Prolog) or consider the
quite general technique of binarizing the clauses. Here is an example:
Suppose you have a predicate p/2 with a clauses:
p(I,J).
p(X,Y) :- q(X,A), r(A,B), s(B,Y).
then rewrite this as:
p(P,Q) :- p(P,Q,true).
p(I,J,Cont) :- call(Cont).
p(X,Y,Cont) :- q(X,A, r(A,B, s(B,Y, Cont))).
of course, you should do a similar thing to the clauses for r and s.
You have now a program in which every clause has one call in the body,
and so it's certainly a tail call: no environment frames are needed
for its execution. Of course, choicepoints remain the same.
If you wish, you can specialize the call(Cont) goal, because you know
that the only Conts possible are of the form true/0, r/3 and s/3.
If there are builtins called and an "early" cut, you adapt the
transformation a bit - here is an example:
p(X,Y,Z) :- X > 17, !, functor(Y,Name,Arity), q(Name,Z), write(Z), foo(X,Y).
becomes:
p(X,Y,Z,Cont) :-
X > 17, !, functor(Y,Name,Arity),
q(Name,Z, mywrite(Z, foo(X,Y, Cont))).
mywrite(T,Cont) :- write(T), call(Cont).
If there are cuts in the body or perhaps some if-then-elses, then you
need to deal with them too.
Bin-Prolog binarizes clauses before compiling to a WAM like engine:
it doesn't have an environment stack. jProlog's compilation schema also
starts with binarizing.
Binarizing (+ some specialization stuff later on) can make programs
faster !
Look at http://www.binnetcorp.com/BinProlog/doc/advanced.html
for a bit more on binarization and some references.
Here is a different version of the TOH problem and its binarized
equivalent:
toh_orig(N,Moves,_From,_Over,_To,Tail) :-
N =:= 0, !,
Moves = Tail.
toh_orig(N,Moves,From,Over,To,Tail) :-
N > 0,
M is N - 1,
toh_orig(M,Moves,From,To,Over,T1),
T1 = [move(From,To)|T2],
toh_orig(M,T2,Over,From,To,Tail).
toh_bin(N,Moves,From,To,Over,Tail) :-
toh_bin(N,Moves,From,To,Over,Tail,true).
toh_bin(N,Moves,_From,_Over,_To,Tail,Con
t) :-
N =:= 0, !,
Moves = Tail,
call(Cont).
toh_bin(N,Moves,From,Over,To,Tail,Cont) :-
N > 0,
M is N - 1,
T1 = [move(From,To)|T2],
toh_bin(M,Moves,From,To,Over,T1,
toh_bin(M,T2,Over,From,To,Tail,Cont)).
Cheers
Bart Demoen
| |
| Markus Triska 2006-02-08, 9:29 am |
| Hi!
Bart Demoen wrote:
> the non-tail-recusion removal has almost no effect
How to best measure this? Here is what I thought I'm doing:
I allocate (on the global stack) 3 terms of arity 4 per call, makes
3*(4+1) = 15 memory cells, give or take a few for encompassing list term
etc. I looked at the definition of local frames in the SWI source and I
got the impression that environment frames are considerably larger than
this.
Thank you in advance,
Markus.
| |
| Markus Triska 2006-02-08, 9:29 am |
| Hi Bart!
Bart Demoen wrote:
> You gained local stack by introducing if-then-else
You're right. I'm sorry for my misleading advice; it was unintentional.
> the tail-recursion doesn't help in this case.
Please see below for my question about this.
All the best,
Markus.
| |
| Markus Triska 2006-02-08, 9:29 am |
| Hi Bart!
Bart Demoen wrote:
>
> Binarizing (+ some specialization stuff later on) can make programs
> faster !
What are drawbacks of binarizing?
Relatedly, I read that Aquarius Prolog can recognize mutual exclusion of
the two move_disks clauses in question and avoid creation of choice
points. Why didn't Aquarius and its techniques spread more widely in
your opinion?
Thank you for your explanations,
Markus.
| |
| Jan Wielemaker 2006-02-08, 9:29 am |
| On 2006-02-04, Markus Triska <triska@gmx.at> wrote:
> Hi!
>
> Bart Demoen wrote:
>
>
> How to best measure this? Here is what I thought I'm doing:
>
> I allocate (on the global stack) 3 terms of arity 4 per call, makes
> 3*(4+1) = 15 memory cells, give or take a few for encompassing list term
> etc. I looked at the definition of local frames in the SWI source and I
> got the impression that environment frames are considerably larger than
> this.
An environment frame takes 8 cells + variables in the current version.
One of these supports the debugger and another supports the profiler.
Whenever memory or performance becomes an issue, looking for unintended
non-determinism is a good start. For performance, indexing of predicates
with many clauses is the next thing to check.
Cheers --- Jan
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Markus Triska wrote:
> Bart Demoen wrote:
>
>
>
> What are drawbacks of binarizing?
Binarizing can make programs slower :-)
No kidding. It is a well known fact that no compiler technique can speed
up all programs. Doesn't quite follow from Rice's theorem, but is close
in spirit.
Anyway, more to the point: binarizing (like transforming to continuation
passing, or agenda building) can consume more heap space and because the
liveness of heap terms is "whole term pointer based" (*), it can keep
"garbage" alive longer than it should. Of course in specific cases, one
can remedy that most often.
(*) if you don't know what I mean, that's fine: I just made it up;
the point is that if you represent an environment (on the local stack)
by a flat term, you can't do environment trimming (or the more
finegrained live-pointer-maps) on it - or at least, it is
difficult/cumbersome.
> Relatedly, I read that Aquarius Prolog can recognize mutual exclusion of
> the two move_disks clauses in question and avoid creation of choice
> points. Why didn't Aquarius and its techniques spread more widely in
> your opinion?
In the great-old Gr style of answering, here is a counter question:-)
How wide-spread do you think this mutual exclusion detection is(n't) ?
Here is an attempt at an answer to your question:
(1) mutual exclusion of clauses is undecidable in general (btw,
undecidability didn't stop Mercury for a wink), so why bother
(2) the programmer can rewrite the clauses so that no effort by
the compiler is needed
(3) [just my personal experience] if your Prolog system does it, some
"classical" benchmarks become uncomparably fast :-)
BTW, the same applies to multi-argument indexing.
Cheers
Bart Demoen
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Markus Triska wrote:
>
>
> How to best measure this? Here is what I thought I'm doing:
You want to measure or you want to reason about it ?
In the former case, you lack the high-water-mark thingie (which is btw
so easy to do in almost any Prolog system that I am surprised
implementors don't bother).
In the latter case ...
Are you interested in constant factors or asymptotics ?
In the former case, you need Jan (for SWI) and he already told you.
In the latter case, you just write down some recursion relations and
solve them - you need this anyway to do the former case correct.
Hey, Markus, you have a degree in computer science, right ?
You should know about all this !
If you don't, you should complain bitterly with your university.
Cheers
Bart Demoen
| |
| Markus Triska 2006-02-08, 9:29 am |
| Hi Bart!
Bart Demoen wrote:
> you have a degree in computer science, right ?
No.
All the best,
Markus.
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Markus Triska wrote:
> Hi Bart!
>
> Bart Demoen wrote:
>
>
>
> No.
>
> All the best,
> Markus.
Sorry, my mistake.
Cheers
Bart
| |
| student 2006-02-08, 9:29 am |
| Bart Demoen wrote:
> student wrote:
>
>
>
> Ah, finally someone who knows where "the clapper hangs in the clock"
oy!
> (flemmish expression).
> The excessive stack space usage is indeed due to the non-recognised
> determinism.
> The extra cut in the first clause solves that problem. Transforming it
> to if-then-else has the same effect and the non-tail-recusion removal
> (as proposed in other contributions) has almost no effect: it is easy to
> check that (with the cut in the first clause) the recursion depth at any
> point does not exceed the N in the original query - so tail recursion
> optimisation should be enough - Bill's statistics figures don't really
> show that (I think) because it gives statistics AFTER all computation is
> finished: what one would need is a high-water mark.
>
> I would however argue strongly against the cut in the second clause: it
> is totally unnecessary. [!]
Thank you for pointing this out to me.
Now I have to figure out why I didn't see it and how I might prove it.
Best regards,
billh
| |
| Jan Wielemaker 2006-02-08, 9:29 am |
| On 2006-02-04, Bart Demoen <bmd@cs.kuleuven.be> wrote:
> Markus Triska wrote:
>
>
> You want to measure or you want to reason about it ?
>
> In the former case, you lack the high-water-mark thingie (which is btw
> so easy to do in almost any Prolog system that I am surprised
> implementors don't bother).
I have some trouble with the semantics. Environment stack is fine, but
the others depend on GC. If you do a lot of aggressive GC you'll
high-water-mark will (often) be lower, but the program slower. If you
do no GC until you really have to the high-water-mark will be high.
So, what does the value mean? At least for SWI-Prolog taking the max
doesn't say you will run out of memory if you re-run the program with
limits that are below the measured high-water mark :-)
Its normal strategy is to double memory (but do not schrink below a
defined minimum). Last time it goes up to the limit. Before giving
up it will start doing frequent GC.
Cheers --- Jan
| |
| Markus Triska 2006-02-08, 9:29 am |
| Hi Bart!
Bart Demoen wrote:
>
> Sorry, my mistake.
>
No problem. What "recursion relations" did you have in mind?
All the best,
Markus.
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Jan Wielemaker wrote:
>
>
> I have some trouble with the semantics. Environment stack is fine, but
> the others depend on GC.
Your worry about the semantics is justified: it is clear that when gc
has entered, the high-water mark figure is no longer easy to relate to
the memory footprint of a run of the program, but if the number of gcs
isn't too high, I can start SWI again with a larger initial heap so that
no gc occurs, and then the high water mark is again what it is.
And the highwater mark is also useful for smallish programs, or
programs that normally run with "large" input, but can be tested with
smallish input.
And if you want to make clear the figure could be misleading, let
statistics/0 print out something like
Heap: current = 256 maximal = 123456 (unreliable because of gc)
Cheers
Bart Demoen
| |
| Bart Demoen 2006-02-08, 9:29 am |
| Markus Triska wrote:
>
> No problem. What "recursion relations" did you have in mind?
Let's do some recursion relations for nrev
floop(foo,n) will denote
the number of floop units for doing foo on an input of size n
since for doing the asymptotics, the constants are not important, in
the formulae below a 1 could as well be 17 (but not in the n-1 expression :-)
or any other non-zero constant
nrev([],[]).
nrev([X|Rest],Ans):-
nrev(Rest,L),
append(L,[X],Ans).
append([],L,L).
append([X|L1],L2,[X|L3]):-
append(L1,L2,L3).
First the time complexity (assuming unit time for unification):
from the clauses of nrev:
time(nrev,0) = 1
time(nrev,n) = 1 + time(nrev,n-1) + time(append,n-1) for n > 0
from the clauses of append:
time(append,0) = 1
time(append,n) = 1 + time(append,n-1) n > 0
Solve them to find that nrev uses n^2 time.
Now (heap) space complexity:
space(nrev,0) = 0
space(nrev,n) = 1 + space(nrev,n-1) + space(append,n-1) n > 0
space(append,0) = 0
space(append,n) = 1 + space(append,n-1) n > 0
Let's do the call stack depth of TOH (abbreviating the Prolog code):
toh(0,From,Over,To,...) :- ...
toh(N,From,Over,To,...) :-
N > 0,
toh(N-1,...),
toh(N-1,...).
First do it without tail-recursion optimization (but with ! in the
first clause):
depth(toh,0) = 1
depth(toh,n) = 1 + max(depth(toh,n-1),depth(toh,n-1)) n > 0
The max expresses that when a call has returned, all stack space has
been released.
You see it uses linear call stack.
Now with TRO:
depth(toh,0) = 1
depth(toh,n) = 1 + max(depth(toh,n-1),depth(toh,n-1)-1) n > 0
The only difference is the -1 at the end: TRO deallocates the current
frame before activating the last call. The end result is the same.
Now pretend that there is no ! in the first clause; the effect is that
some call frames get blocked by a choicepoint. The equations become:
depth(toh,0) = 1
depth(toh,n) = 1 + depth(toh,n-1) + depth(toh,n-1) n > 0
This results in 2^n stack consumption.
Cheers
Bart Demoen
|
|
|
|
|