For Programmers: Free Programming Magazines  


Home > Archive > Prolog > March 2005 > Deleting elments of a list which are in it more than once









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Deleting elments of a list which are in it more than once
Christian Bruckhoff

2005-02-15, 4:03 pm

Hi.
I wanted to write a prolog programm which checks a given list of elements,
which are in it mor than once.
If an element is in twice ore more, all same elements following the first
one should be deketed.

For example:
Given list [a,b,a,a,r,s,z,s,b,t,s,b]
Output list [a,b,r,s,z,t]

My idea was he following:
scan([],[]).
scan([A|Rest],Neu):-member(A, Rest),delete(Rest,A,Neu1),append([A], Neu1,
Neu), scan(Rest,Neu).

But there must be a recursive call of scan for the rest of the list. And
there is my Problem: I can't find the rigth way to call it recursiely. :'(

Anyone has an idea?

--

cya
Christian Bruckhoff


Mick Krippendorf

2005-02-15, 4:03 pm

Christian Bruckhoff wrote:
> I wanted to write a prolog programm which checks a given list of
> elements, which are in it mor than once.
> If an element is in twice ore more, all same elements following the
> first one should be deketed.
>
> For example:
> Given list [a,b,a,a,r,s,z,s,b,t,s,b]
> Output list [a,b,r,s,z,t]
>
> My idea was he following:
> scan([],[]).
> scan([A|Rest],Neu):-member(A, Rest),delete(Rest,A,Neu1),append([A],
> Neu1, Neu), scan(Rest,Neu).
>
> But there must be a recursive call of scan for the rest of the list.
> And there is my Problem: I can't find the rigth way to call it
> recursiely. :'(



How about :

delete_multiples([],[]).
delete_multiples([H|X],[H|Y]):-
delete(X,H,T),
delete_multiples(T, Y).


Mick.
Christian Bruckhoff

2005-02-15, 4:03 pm

Thank you!

"Mick Krippendorf" <mad.mick@gmx.de> schrieb im Newsbeitrag
news:37ejd3F5bmcm2U1@individual.net...
> Christian Bruckhoff wrote:
>
>
> How about :
>
> delete_multiples([],[]).
> delete_multiples([H|X],[H|Y]):-
> delete(X,H,T),
> delete_multiples(T, Y).
>
>
> Mick.



W

2005-02-17, 8:58 pm

This algorithm is O(N**2) -- for each element in the list, you go through
the remainder of the list removing it.
suppose the list were [a,b,a,b,a,b,....,a,b] where .... stands for a lot of
a's and b's.
then you would go throught the list once, removing a, and then go through
the list again (although 1/2 as big) removing b.
An alternative would be to keep track of what you have removed (eg a and b)
and check that list as you go through the original list.
In other words, check each new element with the accumulating list rather
than the existing list.

Of course, the behavior in each case depends on the distribution of
duplicate elements. In the case where there are no duplicates, the
performance is the same. Also in the case where there are only duplicates
(all a, for example).

Anyway, have fun.

Walter Wilson



"Christian Bruckhoff" <brchrist@uni-koblenz.de> wrote in message
news:cut7ae$c9e$1@news.uni-koblenz.de...
> Thank you!
>
> "Mick Krippendorf" <mad.mick@gmx.de> schrieb im Newsbeitrag
> news:37ejd3F5bmcm2U1@individual.net...
>
>



student

2005-02-18, 3:59 am

W wrote:
[color=darkred]
> This algorithm is O(N**2) -- for each element in the list, you go through
> the remainder of the list removing it.


> Suppose the list were [a,b,a,b,a,b,....,a,b] where .... stands for a lot of
> a's and b's.


> Then you would go throught the list once, removing a, and then go through
> the list again (although 1/2 as big) removing b.


> An alternative would be to keep track of what you have removed ... .


Why?

'delete(L1,X,L2)' deletes all occurrences of X from L1.

Therefore, the input list in the recursive call to 'delete_multiples'
cannot contain any instances of any term that has already been seen.

Let me rephrase the matter.

delete_multiples([],[]).
delete_multiples([X|Rest],[X|NoXsAndNoMu
ltiples]):-
write(X),write(Rest),nl,
delete(Rest,X,NoXs),
write(NoXs),nl,
delete_multiples(NoXs,NoXsAndNoMultiples
).

?- delete_multiples([a,b,c,a,b,c,a,b,c,a,b,
c,a,b,c],Q).
a[b, c, a, b, c, a, b, c, a, b, c, a, b, c]
[b, c, b, c, b, c, b, c, b, c]
b[c, b, c, b, c, b, c, b, c]
[c, c, c, c, c]
c[c, c, c, c]
[]

Q = [a, b, c]

What is the 'N' in your 'O(N*N)'?

Is the the number of unique terms that occur among the members of L or
is it the length of L?
--
Bill Spight

2005-02-18, 8:57 am

Dear Christian,

It is possible to program this using an incomplete list. If the element
to be added is already in the list, nothing changes; if it is not, it is
added to the undefined tail of the list.

unique_list([],[]).
unique_list([X],[X]).
unique_list([X|Xs],[X|Ys]) :-
unique_list(Xs,[X|Ys],Ys). % Sets up the incomplete list,
% with an undefined tail (Ys).

unique_list([],_,[]) :- !. % Defines the tail.
unique_list([X|Xs],Ys,Zs0) :-
insert_if_absent(Ys,X,Zs0,Zs),
unique_list(Xs,Ys,Zs).

insert_if_absent([X|_],X,Ys,Ys) :- var(Ys), !. % Present.
% The undefined tail (Ys), remains undefined (a variable).
insert_if_absent([X|Xs],X,[X|Xs],Xs) :- !. % Absent. Adds X to
tail.
insert_if_absent([_|Xs],X,Ys,Zs) :-
insert_if_absent(Xs,X,Ys,Zs).

Best regards,

Bill Spight
Salvador Fandiņo

2005-02-18, 8:57 am

W wrote:
> This algorithm is O(N**2) -- for each element in the list, you go through
> the remainder of the list removing it.
> suppose the list were [a,b,a,b,a,b,....,a,b] where .... stands for a lot of
> a's and b's.
> then you would go throught the list once, removing a, and then go through
> the list again (although 1/2 as big) removing b.
> An alternative would be to keep track of what you have removed (eg a and b)
> and check that list as you go through the original list.
> In other words, check each new element with the accumulating list rather
> than the existing list.


You are still on the O(N**2) side.

A better algorithm is:

- conveniently prepare the list for sorting [O(n)]
- sort the list [O(NlogN)]
- mark duplicates [O(N)] and
- unsort the list while ignoring dups [O(N)]

that is O(NlogN) globally:


prepare([], []).
prepare([H|T], [(H,_)|T1]) :-
prepare(T, T1).

mark_dups([]).
mark_dups([(Key, no_del)|More]) :-
mark_dups(Key, More).

mark_dups(Key, [(Ele, del)|More]) :-
Key == Ele, !,
mark_dups(Key, More).
mark_dups(_, L) :-
mark_dups(L).

clean([], []).
clean([(_, del)|M], M1) :-
!,
clean(M, M1).
clean([(H, no_del)|M], [H|M1]) :-
clean(M, M1).

rm_dups(L, O) :-
prepare(L, L1),
sort(L1, L2),
mark_dups(L2),
clean(L1, L3),
L3 = O.

:- I = [a,a,b,h,a,j,b,y,a,h,h], rm_dups(I, O), writeln(O).


Cheers,

- Salvador

bill hogan

2005-02-18, 3:59 pm

student wrote:
> W wrote:
>
>
>
>
>
>
>
>
>
>
> Why?


Read what he says, idiot!

He is talking about a radically different program, not about
modifying the present one:

"An alternative would be to keep track of what you have removed (eg a and b)
and check that list as you go through the original list.
In other words, check each new element ***with the accumulating list***
rather
than the existing list." (my emphasis)

-- i.e., something like

unique(L1,L2) :-
unique_members(L1,[],L2).

unique_members([],In,In).
unique_members([X|Rest],In,Out) :-
\+ \+ member(X,In),
unique_members(Rest,In,Out).
unique_members([X|Rest],In,Out) :-
\+ member(X,In),
unique_members(Rest,[X|In],Out).

make_list(0,[]).
make_list(N,[N1,N2|Rest]) :-
N > 0,
N1 is N -1,
N2 is N + 1,
Next is N -1,
make_list(Next,Rest).

whence

?- time((make_list(10000,L),delete_multiple
s(L,Q))).
% 100,090,001 inferences, 444.22 CPU in 508.89 seconds (87% CPU,
225316 Lips)

L = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
Q = [9999, 10001, 9998, 10000, 9997, 9996, 9995, 9994, 9993|...] ;

No

?- time((make_list(10000,L),unique(L,Q))).
% 100,150,006 inferences, 173.16 CPU in 173.41 seconds (100% CPU,
578367 Lips)

L = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
Q = [0, 1, 2, 3, 4, 5, 6, 7, 8|...] ;

No

and if you use a "!" in the second clause in 'unique_members', you cut
the latter CPU time in half.

--
Bart Demoen

2005-02-18, 8:57 pm

student wrote:
>
> What is the 'N' in your 'O(N*N)'?
>
> Is the the number of unique terms that occur among the members of L or
> is it the length of L?


Yes.

Bart Demoen
Bart Demoen

2005-02-18, 8:57 pm

Salvador Fandiņo wrote:
> A better algorithm is:
>
> - conveniently prepare the list for sorting [O(n)]
> - sort the list [O(NlogN)]
> - mark duplicates [O(N)] and
> - unsort the list while ignoring dups [O(N)]
>
> that is O(NlogN) globally:
>
>
> prepare([], []).
> prepare([H|T], [(H,_)|T1]) :-
> prepare(T, T1).
>
> mark_dups([]).
> mark_dups([(Key, no_del)|More]) :-
> mark_dups(Key, More).
>
> mark_dups(Key, [(Ele, del)|More]) :-
> Key == Ele, !,
> mark_dups(Key, More).
> mark_dups(_, L) :-
> mark_dups(L).
>
> clean([], []).
> clean([(_, del)|M], M1) :-
> !,
> clean(M, M1).
> clean([(H, no_del)|M], [H|M1]) :-
> clean(M, M1).
>
> rm_dups(L, O) :-
> prepare(L, L1),
> sort(L1, L2),
> mark_dups(L2),
> clean(L1, L3),
> L3 = O.
>
> :- I = [a,a,b,h,a,j,b,y,a,h,h], rm_dups(I, O), writeln(O).



Your program is very nice, but has one drawback: according to the ISO
Prolog standard, its outcome is not defined - it is "implementation
dependent" - which is slightly worse than "implementation defined".

The reason: the (H,_) in prepare/2 results in the comparison of
variables during the execution of the sort/2 goal in rm_dups/2 (if
the initial list contains duplicates).

There is a fix, and since you are already close to it, why don't you
make the fix and post it :-)

Cheers

Bart Demoen

Salvador Fandiņo

2005-02-18, 8:57 pm

Bart Demoen wrote:
> Salvador Fandi=F1o wrote:
>=20
>=20
>=20
>=20
> Your program is very nice, but has one drawback: according to the ISO
> Prolog standard, its outcome is not defined - it is "implementation
> dependent" - which is slightly worse than "implementation defined".
>=20
> The reason: the (H,_) in prepare/2 results in the comparison of
> variables during the execution of the sort/2 goal in rm_dups/2 (if
> the initial list contains duplicates).
>=20
> There is a fix, and since you are already close to it, why don't you
> make the fix and post it :-)


I have no access to the ISO standard so I can only guess :-(

Is it because sort/2 can eliminate as duplicates elements that=20
unify? so, msort/2 should be used, well, actually I suppose I=20
should have used msort/2 from the first time anyway...

The other problem I can see is that an stable sorting algorithm=20
is required. As a work around the pair (H,_) can be replaced by a=20
(H, Ordinal, _) trio where Ordinal is the element index on the=20
original list.

Salvador Fandiņo

2005-02-18, 8:57 pm

Hi,

> Is it because sort/2 can eliminate as duplicates elements that unify?
> so, msort/2 should be used, well, actually I suppose I should have used
> msort/2 from the first time anyway...
>
> The other problem I can see is that an stable sorting algorithm is
> required. As a work around the pair (H,_) can be replaced by a (H,
> Ordinal, _) trio where Ordinal is the element index on the original list.


oops, I see :-)

Comparison of (a, X) and (a, Y) is undefined, so the best
solution is moving to keysort/2 using (H-_) to construct the
list... or using the Ordinal trick if the keysort implementation
is not stable.

Cheers,

- Salvador.

bill hogan

2005-02-19, 8:58 pm

Salvador Fandiņo wrote:
> W wrote:
>
>
>
> You are still on the O(N**2) side.
>
> A better algorithm is:
>
> - conveniently prepare the list for sorting [O(n)]
> - sort the list [O(NlogN)]
> - mark duplicates [O(N)] and
> - unsort the list while ignoring dups [O(N)]
>
> that is O(NlogN) globally:
>
>
> prepare([], []).
> prepare([H|T], [(H,_)|T1]) :-
> prepare(T, T1).
>
> mark_dups([]).
> mark_dups([(Key, no_del)|More]) :-
> mark_dups(Key, More).
>
> mark_dups(Key, [(Ele, del)|More]) :-
> Key == Ele, !,
> mark_dups(Key, More).
> mark_dups(_, L) :-
> mark_dups(L).
>
> clean([], []).
> clean([(_, del)|M], M1) :-
> !,
> clean(M, M1).
> clean([(H, no_del)|M], [H|M1]) :-
> clean(M, M1).
>
> rm_dups(L, O) :-
> prepare(L, L1),
> sort(L1, L2), /* In SWI-Prolog 'sort/2' removes duplicates; 'msort/2' does not. */
> mark_dups(L2),
> clean(L1, L3), /* Shouldn't this be 'clean(L2,L3)' ? */
> L3 = O.
>
> :- I = [a,a,b,h,a,j,b,y,a,h,h], rm_dups(I, O), writeln(O).
>
>
> Cheers,
>
> - Salvador
>


Thank you!

But why not simply

rm_dups(L, O) :-
msort(L, M), /* N.B. In SWI-Prolog 'sort/2' removes duplicates;
'msort/2' does not. */
clean(M, O).

clean([],[]).
clean([Keep|Terms],[Keep|Keepers]) :-
del_if_same(Keep, Terms, Keepers).

del_if_same(Keep,[Next|Terms], Keepers) :-
Keep == Next,
!,
del_if_same(Keep,Terms,Keepers).
del_if_same(_,Terms,Keepers) :-
clean(Terms, Keepers).

?

This gives

?- time((make_list(10000,I),rm_dups(I,O),le
ngth(O,N))).
% 110,008 inferences, 0.31 CPU in 0.33 seconds (93% CPU, 354865 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

By contrast, my

unique2(L1,L2) :-
unique_members2(L1,[],L2).

unique_members2([],In,In).
unique_members2([X|Rest],In,Out) :-
member(X,In),!,
unique_members2(Rest,In,Out).
unique_members2([X|Rest],In,Out) :-
unique_members2(Rest,[X|In],Out).

gives

?- time((make_list(10000,I),unique2(I,O),le
ngth(O,N))).
% 50,125,005 inferences, 87.93 CPU in 88.12 seconds (100% CPU, 570056 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

In an attempt to explain this huge difference to myself,
I was led to reformulate 'unique2' (above) as follows:

unique3(L1,L2) :-
msort(L1,M),
unique_members3(M,[],L2).

unique_members3([],In,In).
unique_members3([X|Rest],In,Out) :-
In=[X|_],
!,
unique_members3(Rest,In,Out).
unique_members3([X|Rest],In,Out) :-
unique_members3(Rest,[X|In],Out).

This gives

?- time((make_list(10000,I),unique3(I,O),le
ngth(O,N))).
% 90,006 inferences, 0.34 CPU in 0.35 seconds (98% CPU, 264724 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...]
N = 10002

which I think compares favorably to the results I get from your 'rm_dups'
(replacing 'sort' with 'msort' and modified per my comment above):

?- time((make_list(10000,I),rm_dups(I,O),le
ngth(O,N))).
% 150,011 inferences, 0.65 CPU in 0.67 seconds (97% CPU, 230786 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
N = 10002

Altogether very interesting and not a lesson I am likely to soon forget.

--
Salvador Fandiņo

2005-02-20, 3:57 pm

Hi,

bill hogan wrote:

[color=darkred]
> ...


>
> But why not simply
> ...



because I was trying to maintain the order of the original list
as much as possible.


> rm_dups(L, O) :-
> msort(L, M), /* N.B. In SWI-Prolog 'sort/2' removes duplicates;
> 'msort/2' does not. */
> clean(M, O).
>
> clean([],[]).
> clean([Keep|Terms],[Keep|Keepers]) :-
> del_if_same(Keep, Terms, Keepers).
>
> del_if_same(Keep,[Next|Terms], Keepers) :-
> Keep == Next,
> !,
> del_if_same(Keep,Terms,Keepers).
> del_if_same(_,Terms,Keepers) :-
> clean(Terms, Keepers).
>
> ?


If you want to remove duplicates and the order of the output list
is not important to you them just use sort/2:

rm_dups(L, O) :- sort(L, O).




> This gives
>
> ?- time((make_list(10000,I),rm_dups(I,O),le
ngth(O,N))).
> % 110,008 inferences, 0.31 CPU in 0.33 seconds (93% CPU, 354865 Lips)
>
> I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
> O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
> N = 10002
>
> By contrast, my
>
> unique2(L1,L2) :-
> unique_members2(L1,[],L2).
>
> unique_members2([],In,In).
> unique_members2([X|Rest],In,Out) :-
> member(X,In),!,
> unique_members2(Rest,In,Out).
> unique_members2([X|Rest],In,Out) :-
> unique_members2(Rest,[X|In],Out).
>
> gives
>
> ?- time((make_list(10000,I),unique2(I,O),le
ngth(O,N))).
> % 50,125,005 inferences, 87.93 CPU in 88.12 seconds (100% CPU, 570056 Lips)
>
> I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
> O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
> N = 10002


Well, it is not a fair comparison, because msort/2 (or sort/2) is
implemented on C while your code was pure prolog ;-)


> In an attempt to explain this huge difference to myself,
> I was led to reformulate 'unique2' (above) as follows:
>
> unique3(L1,L2) :-
> msort(L1,M),
> unique_members3(M,[],L2).
>
> unique_members3([],In,In).
> unique_members3([X|Rest],In,Out) :-
> In=[X|_],


using =/2 here deletes not only equal terms but also terms that
unify sometimes. Better do...

unique_members3([X|Rest], In, Out) :-
In=[Y|_],
X==Y, !,
unique_members3(Rest, In, Out).

> !,
> unique_members3(Rest,In,Out).
> unique_members3([X|Rest],In,Out) :-
> unique_members3(Rest,[X|In],Out).
>
> This gives
>
> ?- time((make_list(10000,I),unique3(I,O),le
ngth(O,N))).
> % 90,006 inferences, 0.34 CPU in 0.35 seconds (98% CPU, 264724 Lips)
>
> I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
> O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...]
> N = 10002
>
> which I think compares favorably to the results I get from your 'rm_dups'
> (replacing 'sort' with 'msort' and modified per my comment above):
>
> ?- time((make_list(10000,I),rm_dups(I,O),le
ngth(O,N))).
> % 150,011 inferences, 0.65 CPU in 0.67 seconds (97% CPU, 230786 Lips)
>
> I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
> O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]
> N = 10002
>
> Altogether very interesting and not a lesson I am likely to soon forget.
>
> --


Cheers,

- Salvador
Bart Demoen

2005-02-20, 8:57 pm

bill hogan wrote:

> I was led to reformulate 'unique2' (above) as follows:
>
> unique3(L1,L2) :-
> msort(L1,M),
> unique_members3(M,[],L2).
>
> unique_members3([],In,In).
> unique_members3([X|Rest],In,Out) :-
> In=[X|_],
> !,
> unique_members3(Rest,In,Out).
> unique_members3([X|Rest],In,Out) :-
> unique_members3(Rest,[X|In],Out).
>
> This gives
>
> ?- time((make_list(10000,I),unique3(I,O),le
ngth(O,N))).
> % 90,006 inferences, 0.34 CPU in 0.35 seconds (98% CPU, 264724 Lips)
>
> I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
> O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...]
> N = 10002
>
> which I think compares favorably to the results I get from your 'rm_dups'
> (replacing 'sort' with 'msort' and modified per my comment above):


Maybe I am missing something here, but does your unique3/2 do exactly
the same as

unique4(L1,L2) :-
sort(L1,L3),
rev(L3,[],L2).

rev([],L,L).
rev([X|R],In,Out) :- rev(R,[X|In],Out).

on ground lists (and on all lists if the = is replaced by == in
unique_members3/3 - barring the comparison of variables of course) ?

Cheers

Bart Demoen
bill hogan

2005-02-21, 3:58 am

Salvador Fandiņo wrote:
> Hi,
>
> bill hogan wrote:
>

....
>
>

....
>
>
> Well, it is not a fair comparison, because msort/2 (or sort/2) is
> implemented on C while your code was pure prolog ;-)
>


Thank you.

My "By contrast, ..." was misplaced: it was not my intention
to compare 'unique2' (above) with (my version of) your 'rm_dups'.

The difference I was trying to call attention to was that between
'unique2' (see above), which, because it makes no assumptions about its
input list, has to
check the current input element against all the elements in the
accumulator, and (what I
should have written as)

unique3(L1,L2) :-
unique_members3(L1,[],L2).

unique_members3([],In,In).
unique_members3([X|Rest],In,Out) :-
In=[X|_],
!,
unique_members3(Rest,In,Out).
unique_members3([X|Rest],In,Out) :-
unique_members3(Rest,[X|In],Out).

which, because it assumes that its input list has been sorted,
only has to check the current input element against the first element
in the acccumulator.

That is,

?- make_list(10000,I), time((unique2(I,O))).
% 50,075,002 inferences, 86.03 CPU in 86.16 seconds (100% CPU, 582064
Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...] ;

versus

?- make_list(10000,I), msort(I,SI), time((unique3(SI,O))).
% 40,002 inferences, 0.07 CPU in 0.07 seconds (101% CPU, 571457 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
SI = [0, 1, 2, 2, 3, 3, 4, 4, 5|...]
O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...] ;

How the input to 'unique3' comes to be sorted (if it is) and whether
it is worthwhile sorting it (if it is not) are separate issues.

--

bill hogan

2005-02-21, 8:59 pm

Bart Demoen wrote:
> bill hogan wrote:
>
>
>
> Maybe I am missing something here, but does your unique3/2 do exactly
> the same as
>
> unique4(L1,L2) :-
> sort(L1,L3),
> rev(L3,[],L2).
>
> rev([],L,L).
> rev([X|R],In,Out) :- rev(R,[X|In],Out).
>
> on ground lists (and on all lists if the = is replaced by == in
> unique_members3/3 - barring the comparison of variables of course) ?
>


Yes (assuming 'sort/2' removes duplicates).

I used 'msort' precisely because I was interested in comparing the
problem
of removing duplicate elements from an unsorted list ('unique2')
to the problem of removing duplicate elements from a sorted list.
('unique3').

As I pointed out in another reply, I should have made this clear by
writing

unique3(L1,L2) :-
unique_members3(M,[],L2).

and then comparing

make_list(10000,I), time( (unique2(I,O)) ).

with

make_list(10000,I), msort(I,S), time( (unique3(S,O)) ).

As to the question of whether or not

make_list(N,I), time( (my_sort(I,S), unique3(S,O)) ).

is in reality significantly faster than

make_list(N,I), time((unique2(I,O))).

for large N when 'my_sort' is written in Prolog (and does not remove
duplicates), I am not so sure.

For example, with

/*
from http://kti.ms.mff.cuni.cz/~bartak/prolog/sorting.html
*/

my_msort([],[]).
my_msort([X],[X]).
my_msort(List,Sorted):-
List=[_,_|_],
divide(List,L1,L2),
my_msort(L1,Sorted1),
my_msort(L2,Sorted2),
my_merge(Sorted1,Sorted2,Sorted).

my_merge([],L,L).
my_merge(L,[],L):-L\=[].
my_merge([X|T1],[Y|T2],[X|T]):-
X=<Y,
my_merge(T1,[Y|T2],T).
my_merge([X|T1],[Y|T2],[Y|T]):-
X>Y,
my_merge([X|T1],T2,T).

divide(L,A,B):-hv(L,[],A,B).

hv(L,L,[],L). % for lists of even length
hv(L,[_|L],[],L). % for lists of odd length
hv([H|T],Acc,[H|L],B):-hv(T,[_|Acc],L,B).


just the cost of sorting the input list far exceeds the cost of simply
picking out
successive unique elements from the original unsorted list:

?- make_list(10000,L), time((my_msort(L,_))).
% 868,486 inferences, 114.05 CPU in 114.82 seconds (99% CPU, 7615 Lips)

?- make_list(10000,I), time((unique2(I,O))).
% 50,075,002 inferences, 87.34 CPU in 90.90 seconds (96% CPU, 573334 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]

-
bill
Bart Demoen

2005-02-22, 8:58 am

bill hogan wrote:

> For example, with
>
> /*
> from http://kti.ms.mff.cuni.cz/~bartak/prolog/sorting.html
> */

[...]
>
> just the cost of sorting the input list far exceeds the cost of simply
> picking out
> successive unique elements from the original unsorted list:
>
> ?- make_list(10000,L), time((my_msort(L,_))).
> % 868,486 inferences, 114.05 CPU in 114.82 seconds (99% CPU, 7615 Lips)
>
> ?- make_list(10000,I), time((unique2(I,O))).
> % 50,075,002 inferences, 87.34 CPU in 90.90 seconds (96% CPU, 573334 Lips)
>


Roman Bartak (author of the webpage you mention above) is a wise and cautious man.
He writes in the comment just above the code for his version of merge sort:

If properly implemented it could be a very efficient algorithm.

Which means to me that he knows that what follows is not properly implemented,
and consequentely not very efficient ...

The version of merge sort you took from Roman leaves a lot of unnecessary choicepoints
(in any Prolog I know of) but the main problem is the implementation of divide/3
which is really horrible for efficiency.

Try the following version

divide([],[],[]).
divide([X|R],[X|Out1],Out2) :- divide(R,Out2,Out1).

instead and you will see a tremendous speedup (factor 100 or so for the test you did).
If you additionally get rid of all unnecessary choice points in merge and msort, you might
gain even more.

After you have done that, maybe you want to reconsider your conclusion regarding
unique2 ?

Cheers

Bart Demoen
Joachim Schimpf

2005-02-22, 8:58 am

Bart Demoen wrote:
>
> Your program is very nice, but has one drawback: according to the ISO
> Prolog standard, its outcome is not defined - it is "implementation
> dependent" - which is slightly worse than "implementation defined".
>
> The reason: the (H,_) in prepare/2 results in the comparison of
> variables during the execution of the sort/2 goal in rm_dups/2 (if
> the initial list contains duplicates).



I think you are making the standard worse than it is.
Comparison of variables is not completely implementation-dependent,
in particular, it does not allow different variables to compare
identical.

I can therefore not see any problem with Salvador's program.
Have I overlooked something?


--
Joachim Schimpf / phone: +44 20 7594 8187
IC-Parc / mailto:J.Schimpf@imperial.ac.uk
Imperial College London / http://www.icparc.ic.ac.uk/eclipse

Bart Demoen

2005-02-22, 8:59 pm

Joachim Schimpf wrote:
> Bart Demoen wrote:
>
>
>
>
> I think you are making the standard worse than it is.
> Comparison of variables is not completely implementation-dependent,
> in particular, it does not allow different variables to compare
> identical.
>
> I can therefore not see any problem with Salvador's program.
> Have I overlooked something?


The program tries to remove "later" duplicates - as can be seen
from the queries

?- rm_dups([a,b,a],L).

L = [a,b]
Yes
?- rm_dups([b,a,b],L).

L = [b,a]
Yes

To achieve this, it is crucial that the void variables created in
(H,_) are ordered (by sort - which uses @< which is implementation
dependent by section 8.4.1.4 of ISO) by their creation time. If that
order were not guaranteed, then rm_dups wouldn't work as intended.
So the program relies on a particular implementation of an
implementation dependent feature.

Another problem would be that since comparison of different variables
is implementation dependent, an implementation might throw an exception
(during sort) whenever two variables are compared (unless they are equal).

Or have I overlooked something ?

Cheers

Bart Demoen
Salvador Fandiņo

2005-02-22, 8:59 pm

Bart Demoen wrote:

> The version of merge sort you took from Roman leaves a lot of
> unnecessary choicepoints
> (in any Prolog I know of) but the main problem is the implementation of
> divide/3
> which is really horrible for efficiency.
>
> Try the following version
>
> divide([],[],[]).
> divide([X|R],[X|Out1],Out2) :- divide(R,Out2,Out1).


this is not the proper way to break a list for mergesort because
it will make the sorting unstable.

Good news is that doing it right is equally faster and uses less
memory :-)

--------------------------------------
divide(L, A, B) :-
div1(L, L, A, B).

div1([], L, [], L).
div1([_|T], [H|L], [H|A], B) :-
div2(T, L, A, B).

div2([], L, [], L).
div2([_|T], L, A, B) :-
div1(T, L, A, B).


bad_divide([], [], []).
bad_divide([H|L], [H|A], B) :-
bad_divide(L, B, A).

divide_n(N, L) :-
( N>0
-> divide(L,_,_),
N1 is N-1,
divide_n(N1, L)
; true ).

bad_divide_n(N, L) :-
( N>0
-> bad_divide(L,_,_),
N1 is N-1,
divide_n(N1, L)
; true ).

make_list(0,[]).
make_list(N,[N1,N2|Rest]) :-
N > 0,
N1 is N -1,
N2 is N + 1,
Next is N -1,
make_list(Next,Rest).

--------------------------------

?- make_list(10000, L), time(bad_divide_n(1000, L)).
% 20,005,001 inferences, 58.85 CPU in 61.03 seconds (96% CPU,
339932 Lips)

?- make_list(10000, L), time(divide_n(1000, L)).
% 20,005,002 inferences, 58.92 CPU in 61.25 seconds (96% CPU,
339528 Lips)

?- make_list(1000, L), time(bad_divide_n(10000, L)).
% 20,050,001 inferences, 50.47 CPU in 52.20 seconds (97% CPU,
397266 Lips)

?- make_list(1000, L), time(divide_n(10000, L)).
% 20,050,002 inferences, 51.05 CPU in 52.25 seconds (98% CPU,
392752 Lips)

?- make_list(100, L), time(bad_divide_n(100000, L)).
% 20,500,001 inferences, 48.68 CPU in 50.32 seconds (97% CPU,
421118 Lips)

?- make_list(100, L), time(divide_n(100000, L)).
% 20,500,002 inferences, 48.60 CPU in 50.22 seconds (97% CPU,
421811 Lips)



Cheers,

- Salvador




Joachim Schimpf

2005-02-22, 8:59 pm

Bart Demoen wrote:
>
> The program tries to remove "later" duplicates
> ...
> To achieve this, it is crucial that the void variables created in
> (H,_) are ordered (by sort - which uses @< which is implementation
> dependent by section 8.4.1.4 of ISO) by their creation time.


Yes, you're right...


>
> Another problem would be that since comparison of different variables
> is implementation dependent, an implementation might throw an exception
> (during sort) whenever two variables are compared (unless they are equal).


Well, sort/2 is not specified in the standard, and I'd consider such
an implementation broken. Sorting for the purpose of making duplicates
adjacent is a very common technique after all. Sure, it's metalogical,
but so is sorting even with a single variable in the list.


--
Joachim Schimpf / phone: +44 20 7594 8187
IC-Parc / mailto:J.Schimpf@imperial.ac.uk
Imperial College London / http://www.icparc.ic.ac.uk/eclipse

Salvador Fandiņo

2005-02-22, 8:59 pm

Salvador Fandi=F1o wrote:
> Bart Demoen wrote:
>=20
>=20
>=20
> this is not the proper way to break a list for mergesort because it wil=

l=20
> make the sorting unstable.
>=20
> Good news is that doing it right is equally faster and uses less memory=

:-)

There was a bug on the timing function for bar_divide, actually=20
my divide predicate is faster, at least for long lists :-))

> ?- make_list(10000, L), time(bad_divide_n(1000, L)).

% 20,004,002 inferences, 68.54 CPU in 71.13 seconds (96% CPU,=20
291859 Lips)

> ?- make_list(10000, L), time(divide_n(1000, L)).

% 20,005,002 inferences, 57.35 CPU in 58.22 seconds (99% CPU,=20
348823 Lips)

> ?- make_list(1000, L), time(bad_divide_n(10000, L)).

% 20,040,002 inferences, 52.84 CPU in 54.68 seconds (97% CPU,=20
379258 Lips)

> ?- make_list(1000, L), time(divide_n(10000, L)).

% 20,050,002 inferences, 48.71 CPU in 49.98 seconds (97% CPU,=20
411620 Lips)

> ?- make_list(100, L), time(bad_divide_n(100000, L)).

% 20,400,002 inferences, 48.78 CPU in 50.59 seconds (96% CPU,=20
418204 Lips)

> ?- make_list(100, L), time(divide_n(100000, L)).

% 20,500,002 inferences, 47.50 CPU in 48.33 seconds (98% CPU,=20
431579 Lips)

> ?- make_list(10, L), time(bad_divide_n(1000000, L)).

% 24,000,002 inferences, 55.18 CPU in 57.23 seconds (96% CPU,=20
434940 Lips)

> ?- make_list(10, L), time(divide_n(1000000, L)).

% 25,000,002 inferences, 55.15 CPU in 56.75 seconds (97% CPU,=20
453309 Lips)


I have also run the timings on YAP:

% 10000 elements
cputime(5.121)
cputime(3.613)

% 1000 elements
cputime(2.389)
cputime(1.639)

% 100 elements
cputime(2.312)
cputime(1.939)

% 10 elements
cputime(3.138)
cputime(2.756)

Cheers,

- Salvador.
student

2005-02-22, 8:59 pm

Bart Demoen wrote:
> bill hogan wrote:
>
> [...]
>
....[color=darkred]
>
> The version of merge sort you [used] ... leaves a lot of
> unnecessary choicepoints
> (in any Prolog I know of) but the main problem is the implementation of
> divide/3
> which is really horrible for efficiency.
>
> Try the following version
>
> divide([],[],[]).
> divide([X|R],[X|Out1],Out2) :- divide(R,Out2,Out1).
>


Ok, with just that change, I get


?- make_list(10000,I),time( (unique2(I,O)) ).

% 50,075,035 inferences, 89.00 CPU in 90.12 seconds (99% CPU, 562641 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
O = [0, 1, 2, 3, 4, 5, 6, 7, 8|...]

?- make_list(10000,I),time( (my_msort(I,SI),unique3(SI,O)) ).
% 1,260,116 inferences, 5.77 CPU in 6.26 seconds (92% CPU, 218391 Lips)

I = [9999, 10001, 9998, 10000, 9997, 9999, 9996, 9998, 9995|...]
SI = [0, 1, 2, 2, 3, 3, 4, 4, 5|...]
O = [10001, 10000, 9999, 9998, 9997, 9996, 9995, 9994, 9993|...]

Wow!

Thank you.

>
> After you have done that, maybe you want to reconsider your conclusion
> regarding unique2 ?
>


I was careful not to draw any conclusions.

I simply reported some facts that I thought needed to be squared
because they raised in my mind the possiblility that the
proposition in question might assume the availability of more
computing resources than I happened to have at my disposal:
[color=darkred]

(*** emphasis added ***).

That, happily, turns out not to have been the case, but /something/
is still bugging me.

Sorting is not intrinsic to the idea of picking out the unique
elements of a list: here, it is merely being used to impose adjacency
on like elements, so why should sorting a list first and then applying
unique3 *necessarily* be faster than applying unique2 to the
original unsorted list?

What are the assumptions on which that conclusion rests?

Specifically, if the list membership test in Prolog were
O(log(n)), would this apparent computational advantage not
disappear? If it would, is it possible to implement lists (or possibly
"sets") in such a way that the corresponding membership test would be
O(log(n))? If it is, why hasn't that been done -- or has it? If it is
not, why not?

ps. I am not being argumentative. These questions have been bugging me
for a very long time. In fact, as I discovered in the course of
composing this reply, my very first post in this newsgroup, in 1993,
(<750066824.2@wyrm.rbbs-net.ORG> ) was about this very issue.

--
billh

Bart Demoen

2005-02-24, 8:58 am

student wrote:

> I was careful not to draw any conclusions.


You had written previously:

> Altogether very interesting and not a lesson I am likely to soon forget.


so I assumed you had made some conclusion - but I might be wrong.


> Sorting is not intrinsic to the idea of picking out the unique
> elements of a list: here, it is merely being used to impose adjacency
> on like elements, so why should sorting a list first and then applying
> unique3 *necessarily* be faster than applying unique2 to the
> original unsorted list?


The difference in intrinsic complexity of the algorithms used: one is O(nlogn),
the other O(n^2) (I think this was pointed out earlier by someone else, no ?)
+ the fact that you use these algorithms on input that is large enough to make
the sorting version faster.


> What are the assumptions on which that conclusion rests?


Large input and worst case analysis: the worst case for the unique2
program is that the number of different elements in the list is
proportional to the length of the list. That gives you O(N^2).
O(NlogN) beats O(N^2) for large input.


> Specifically, if the list membership test in Prolog were
> O(log(n)), would this apparent computational advantage not
> disappear?


Yes of course, but you might lose some other properties of Prolog
lists that you really care a lot about as well.

I will try to use terminology as follows:

a list is the thing that gives contant time access to
the first element and to the tail, which is also a list;
you can add an element to a list in constant time and delete
the first element in constant time;
you can instantiate the open tail of an open ended list in
constant time;
list elements can be partially instantiated and later instantiated
with a cost that does not depend on the size of the list ...
and the constant times involved above are small: list processing
is fast

a Prolog list is the thing you write with [ ] and it usually has the
properties above - in fact, most often one counts on Prolog lists
having the above properties


The membership test for a Prolog list is not constant time, not even
O(logN).

Constant time membership test can be achieved by a hashtable, so let's
see whether we can use hashtables for implementing a Prolog list while
retaining the other nice properties of a list.


Adding an element to this hashtable list is constant time - excellent.

Accessing the first element and the tail ... we need to remember in
some way which is the element that was added most recently and be able
to go from the first to the second in constant time etc. No big deal:
we'll link the elements by a back pointer to the previous element.


Now some more tricky things:

1)
R = [1,2,3,4], L1 = [a|R], L2 = [b|R]

Do we now have one or three hashtables ? If we have one, how do we
represent internally the fact that the element b is in L2 and not in
L1 ?
If we have three, what is the cost of constructing the L1 hashtable
starting from the R hashtable ? Is it constant time (as should be the
case for lists) ?

2)
L = [1,2,3|Tail], Tail = [a,b,c]

Representing an open ended list with a (variant of a) hash table looks
doable.

But then we instantiate that open end with another hashtable
represented list - what do we do ? Merge the hashtables ? Is that
(small) constant time as we expect from instantiating the open end of
a list ?


3)

L = [A,B,C], A = 17, b = a, C = 17.

Here we have a list with variables that get instantiated later ?
That seems to mess up the hashtable structure massively ...


There seems to be no free lunch here.



> If it would, is it possible to implement lists (or possibly
> "sets") in such a way that the corresponding membership test would be
> O(log(n))?



The word "set" is crucial in your question: yes, you can implement
sets so that the membership test is O(log(n)) and it has been done
(check Prolog libraries or the net). But a set is not a list and a
list is not a set. And Prolog implementations don't implement Prolog
lists as a set because we want Prolog lists to have list properties in
the first place. You could wonder why Prolog does not have a built-in
set data type. I don't know.

Many Prolog built-ins use/produce/expect Prolog lists with list
properties. That's what forces you often to use a list while what you
want is actually a set. When that happens, and it is worth doing, you
just convert between the two. A god sort predicate is your friend :-)

Hope this helps.

Cheers

Bart Demoen


>
> ps. I am not being argumentative. These questions have been bugging me
> for a very long time. In fact, as I discovered in the course of
> composing this reply, my very first post in this newsgroup, in 1993,
> (<750066824.2@wyrm.rbbs-net.ORG> ) was about this very issue.
>
> --
> billh
>

student

2005-02-27, 3:58 am

Bart Demoen wrote:
> student wrote:
>
>
>
> You had written [earlier in this thread]:
>
forget.[color=darkred]
>


I made that statement at the very end of my question
(<s%NRd.7526$m31.93757@typhoon.sonic.net> ) to Salvador about his
response (<55lRd.466573$A7.673009@telenews.teleline.es> ) to W's
comment (<cv34hj01rfm@enews4.newsguy.com> ) about Mick's reply
(<37ejd3F5bmcm2U1@individual.net> ) to the OP; I had just learned
that the combination of 'msort' and 'unique3' was orders of
magnitude faster than 'unique2', and it is that discovery which
I found altogether very interesting and am not likely to forget
anytime soon.

> so I assumed you had made some conclusion - but I might be wrong.


Agreed.

If you ever do figure out what conclusion it was that you
thought I should perhaps reconsider, just let me know and I will
reconsider it.

[color=darkred]
> The difference in intrinsic complexity of the algorithms used:
> one is O(nlogn), the other O(n2) (I think this was pointed out
> earlier by someone else, no ?) + the fact that you use these
> algorithms on input that is large enough to make the sorting
> version faster.


[color=darkred]
> Large input and worst case analysis: the worst case for the
> unique2 program is that the number of different elements in the
> list is proportional to the length of the list. That gives you
> O(N2). O(NlogN) beats O(N2) for large input.


[color=darkred]
> Yes of course, but you might lose some other properties of
> Prolog lists that you really care a lot about as well.


I find that analysis convincing but there is still
something about it that seems arbitrary to me.

<philosophy>

The difference between

unique2(L1,L2) :-
unique_members2(L1,[],L2).

unique_members2([],In,In).
unique_members2([X|Rest],In,Out) :-
member(X,In),!,
unique_members2(Rest,In,Out).
unique_members2([X|Rest],In,Out) :-
unique_members2(Rest,[X|In],Out).

-- which, because it makes no assumptions about its input list,
has to check the current input element against all the elements
in the accumulator -- and

unique3(L1,L2) :-
unique_members3(L1,[],L2).

unique_members3([],In,In).
unique_members3([X|Rest],In,Out) :-
In=[X|_],
!,
unique_members3(Rest,In,Out).
unique_members3([X|Rest],In,Out) :-
unique_members3(Rest,[X|In],Out).

-- which, because it assumes that its input list has been
sorted, only has to check the current input element against the
first element in the acccumulator -- interests me very much, for
reasons that interest me even more.

As I said (in <750066824.2@wyrm.rbbs-net.ORG> ) way back in 1993,

> In my more rational moments I take the position that intelligibility
> _ought_ to be the best criterion of bestness and that if in certain
> circumstances it appears not to be then the fault lies with the
> underlying "computer" and not with the principle.


I believe that even more strongly today.

Despite the emphasis that is placed on what a predicate "does", I
think it is important not to lose sight of the fact that
language is /description/, a way of /seeing/, and that what a
predicate is /about/ is what it describes.

Typically, we take "the computer" as given and use it as the
standard of measure with which to compare the efficiency of
different logically equivalent predicates, but does the
computational efficiency of a predicate necessarily measure its
descriptive power?

Evidently not, at least not if "the computer" being used as the
standard of measure is an i386-based PC like mine.

'unique2' perfectly describes the class 'lists of unique
elements' in terms of the class 'lists' and that is what it
calls to my mind when I read it.

By contrast, it is not obvious what 'unique3' describes and I
draw a blank when I read it; in fact, 'unique3' makes virtually
no sense unless we know that it is about lists that have already
been sorted.

Yet the combination of 'my_sort' and 'unique3' is orders of
magnitude faster than 'unique2'!

That is a big problem for my principle and it forces me to ask:

Is this difference a consequence of some cruel Law of Nature
that punishes the simplicity and directness of 'unique2' and
rewards the more complicated and less apparent combination of
'my_msort' and 'unique3', or is it fundamentally a consequence
of the sequential nature of the underying (Turing-Post/von
Neuman) "computer"?

I find it hard to believe that the idea a computer that would
make it possible to implement Prolog in such a way that
'unique2' would be automatically be O(n*log(n)) or better falls
in the same category as the idea of a perpetual motion machine,
and I base that view in part on the amazing "computational"
abilities that some so-called autistic savants possess (see,
e.g.,
[url]http://www.guardian.co.uk/wend/story/0,,1409903,00.html[/url]),
abilities which in some cases seem almost not to involve time at
all.

Why not take predicates as the givens and use descriptive power
as the standard of measure by which to compare the relevancy of
radically different kinds of computers?

</philosophy>

Suppose I start with a slightly more abstract form of 'unique2':

abstract_unique(InputStructure,OutputStr
ucture) :-
abstract_empty_structure(Nil),
abstract_unique_members(InputStructure,N
il,OutputStructure).

abstract_unique_members(Structure,In,Out
) :-
abstract_cons(X,Rest,Structure),
!,
abstract_dispatcher(X,Rest,In,Out).
abstract_unique_members(_,In,In).

abstract_dispatcher(X,Rest,In,Out) :-
abstract_member(X,Rest),
!,
abstract_unique_members(Rest,In,Out).
abstract_dispatcher(X,Rest,In,Out) :-
abstract_cons(X,In,Next),
abstract_unique_members(Rest,Next,Out).

where

abstract_empty_structure/1

abstract_cons/3

and

abstract_membership_test/2

get specialized according to the particular structure (list,
avl, rbtree, etc) being tested.

Is there no kind of orderly structure of terms that allows
duplicates, yet can be "unique-ified" in O(n*log(n)) time?

(A rhetorical question which as a result of your kind reply I
now feel a bit better-equipped to explore in coming ws and
months, time and circumstances permitting.)

Thank you.

--
billh


Bart Demoen

2005-02-27, 4:00 pm

student wrote:

> 'unique2' perfectly describes the class 'lists of unique
> elements' in terms of the class 'lists' and that is what it
> calls to my mind when I read it.
>
> By contrast, it is not obvious what 'unique3' describes and I
> draw a blank when I read it; in fact, 'unique3' makes virtually
> no sense unless we know that it is about lists that have already
> been sorted.



I disagree: unique3 makes sense on any list, not just sorted lists.
It is just a matter of carefully phrasing it - talking procedurally:

unique3 removes duplicates when they are adjacent

?- unique3([a,a,b,c,c,a,a,a,d,a,x],L).

L = [x, a, d, a, c, b, a]

[unique3 also reverses the order, but that is not essential]

Trying to phrase it in a style you used about unique2, I would say

unique3 perfectly describes the class 'lists without adjacent
duplicates' in terms of the class 'lists'

Once it is obvious what unique3 means as a relation, it is also clear
that

unique2 == sort + unique3

as relations. I think that without making use of what unique3 is, one
can't prove the above equality.


>
> Yet the combination of 'my_sort' and 'unique3' is orders of
> magnitude faster than 'unique2'!
>
> That is a big problem for my principle and it forces me to ask:
>
> Is this difference a consequence of some cruel Law of Nature
> that punishes the simplicity and directness of 'unique2' and
> rewards the more complicated and less apparent combination of
> 'my_msort' and 'unique3',


One might wish that optimal simplicity results in optimality of other
criteria like complexity or performance, but it just seems not true.
There is something too simple about unique2 and maybe yes, some cruel
Law of Nature punishes solutions that are too simple. The same is true
for sorting: a simple sorting method like insertion sort is punished;
permutation sort is even more simple to write and closer to a logic
specification, and it sucks performance wise.


> or is it fundamentally a consequence
> of the sequential nature of the underying (Turing-Post/von
> Neuman) "computer"?


Maybe - the non-deterministic Turing machine does away (partly)
with sequentiality and you get (non-deterministic) polynomial
behaviour for "simple" programs - but the price you pay is that the
notion of complexity has changed drastically.


> Is there no kind of orderly structure of terms that allows
> duplicates, yet can be "unique-ified" in O(n*log(n)) time?


Sure there is, even better than O(n*log(n)) time !
Represent a collection with 3 b's and 2 a's and 5 c's as

[b/3, a/2, c/5]

and unique-ifying can be done in O(n). But it comes at a cost for
supporting other operations like having non-ground terms that you
can unify later, add an element or delete the most recently added
element ...
[this might all make no sense if you have a particular meaning of
"orderly" that I don't know]


Here is another example of the same fenomenon: TSP is NP-complete, but
I can build a representation of a graph that lets me solve TSP in
constant time ... only, if you give me nodes and edges incrementally
then building this representation takes me non-poly time (or I would
be very rich by now :-).


Cheers

Bart Demoen
tmp123

2005-02-28, 8:59 pm

Salvador Fandi=F1o wrote:
> W wrote:
through[color=darkred]
a lot of[color=darkred]
through[color=darkred]
a and b)[color=darkred]
rather[color=darkred]
>
> You are still on the O(N**2) side.
>
> A better algorithm is:
>
> - conveniently prepare the list for sorting [O(n)]
> - sort the list [O(NlogN)]
> - mark duplicates [O(N)] and
> - unsort the list while ignoring dups [O(N)]
>


About next algorithm:
Doubt1: is it OK? (no time to test it in deep).
Doubt2: is it O(N*log(N))?

| nodups(I,O) :- nodups(I,_,O).
| nodups([],_,[]).
| nodups([H|Q],TREE,O) :-
| tree(H,TREE,FOUND),
| ( FOUND=3D=3Dtrue ->
| nodups(Q,TREE,O)
| ;
| O=3D[H|QR],
| nodups(Q,TREE,QR)
| ).
|
| tree(H,[K,_,_],true) :- K=3D=3DH, !.
| tree(H,[K,_,_],fail) :- var(K), K=3DH, !.
| tree(H,[K,S,_],R) :-
| H @< K, !,
| tree(H,S,R).
| tree(H,[_,_,G],R) :-
| tree(H,G,R).

?- nodups([a,b,a,c],X).
X=3D[a,b,c].

Kind regards.

Bart Demoen

2005-02-28, 8:59 pm

NNTP-Posting-Host: seven.kulnet.kuleuven.ac.be
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: ikaria.belnet.be 1109624749 4017 134.58.127.12 (28 Feb 2005 21:05:49 GMT)
X-Complaints-To: abuse@belnet.be
NNTP-Posting-Date: Mon, 28 Feb 2005 21:05:49 +0000 (UTC)
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7) Gecko/20040616
X-Accept-Language: en-us, en
In-Reply-To: <1109620664.864001.304620@z14g2000cwz.googlegroups.com>
Cache-Post-Path: seven.kulnet.kuleuven.ac.be!unknown@10-91-97-252.kotnet.org
X-Cache: nntpcache 2.4.0b5 (see http://www.nntpcache.org/)
Xref: number1.nntp.dca.giganews.com comp.lang.prolog:27226

tmp123 wrote:

> About next algorithm:
> Doubt1: is it OK? (no time to test it in deep).
> Doubt2: is it O(N*log(N))?
>
> | nodups(I,O) :- nodups(I,_,O).
> | nodups([],_,[]).
> | nodups([H|Q],TREE,O) :-
> | tree(H,TREE,FOUND),
> | ( FOUND==true ->
> | nodups(Q,TREE,O)
> | ;
> | O=[H|QR],
> | nodups(Q,TREE,QR)
> | ).
> |
> | tree(H,[K,_,_],true) :- K==H, !.
> | tree(H,[K,_,_],fail) :- var(K), K=H, !.
> | tree(H,[K,S,_],R) :-
> | H @< K, !,
> | tree(H,S,R).
> | tree(H,[_,_,G],R) :-
> | tree(H,G,R).



It looks ok (I didn't test it thoroughly either) but it definitely isn't
O(N*log(N)) because the tree you build in the second arg of nodups/3 is
not balanced in a way that would make it of that complexity.
You can test that easily by added the following code:

makelist(N,Out) :-
(N < 0 ->
Out = []
;
Out = [N|R],
M is N - 1,
makelist(M,R)
).

test(N) :-
makelist(N,L),
time(nodups(L,_)).


and checking the following queries (and their results) in SWI-Prolog:

?- t(100).
% 31,110 inferences, 0.01 CPU in 0.01 seconds (121% CPU, 3111000 Lips)

Yes
?- t(200).
% 122,210 inferences, 0.04 CPU in 0.04 seconds (108% CPU, 3055250 Lips)

Yes
?- t(400).
% 484,410 inferences, 0.13 CPU in 0.13 seconds (98% CPU, 3726231 Lips)

Yes
?- t(800).
% 1,928,810 inferences, 0.52 CPU in 0.52 seconds (101% CPU, 3709250 Lips)

Yes

You can see the number of inferences and the time going up
quadratically.


Cheers

Bart Demoen
tmp123

2005-02-28, 8:59 pm

>
> | nodups(I,O) :- nodups(I,_,O).
> | nodups([],_,[]).
> | nodups([H|Q],TREE,O) :-
> | tree(H,TREE,FOUND),
> | ( FOUND==true ->
> | nodups(Q,TREE,O)
> | ;
> | O=[H|QR],
> | nodups(Q,TREE,QR)
> | ).
> |
> | tree(H,[K,_,_],true) :- K==H, !.
> | tree(H,[K,_,_],fail) :- var(K), K=H, !.
> | tree(H,[K,S,_],R) :-
> | H @< K, !,
> | tree(H,S,R).
> | tree(H,[_,_,G],R) :-
> | tree(H,G,R).
>
> ?- nodups([a,b,a,c],X).
> X=[a,b,c].
>


Performance measures:

For a list of 10000 random integers in [0,255]:

| make_list(0,[]).
| make_list(N,[H|Q]) :-
| H is random(256),
| Next is N -1,
| make_list(Next,Q).

"time" gives the results:

?- make_list(10000,X), time(nodups(X,Y)), length(Y,Z).
% 535,223 inferences in 0.15 seconds (3568153 Lips)

X = [167, 70, 252, 38, 138, 116, 184, 41, 58, 53|...]
Y = [167, 70, 252, 38, 138, 116, 184, 41, 58, 53|...]
Z = 256

?- make_list(10000,X), time(unique(X,Y)), length(Y,Z).
% 1,311,389 inferences in 0.82 seconds (1599255 Lips)

X = [11, 141, 91, 32, 72, 128, 181, 138, 162, 109|...]
Y = [50, 0, 196, 69, 10, 45, 79, 197, 66, 103|...]
Z = 256


PS: I suposse it could be faster if the tree is balanced.

tmp123

2005-02-28, 8:59 pm

Yes, I agree with your analisis.

However, in case of random numbers, it seems a few better:

?- main(1000). # 1000 elements
% 62,019 inferences in 0.02 seconds (3100950 Lips)
251 # length of resulting list:
251 different integers.

Yes
?- main(2000).
% 102,131 inferences in 0.03 seconds (3404367 Lips)
256

Yes
?- main(4000).
% 235,623 inferences in 0.07 seconds (3366043 Lips)
256

Yes
?- main(8000).
% 475,861 inferences in 0.12 seconds (3965508 Lips)
256

?- main(16000).
% 968,343 inferences in 0.26 seconds (3724396 Lips)
256

PS: Sugestion of a nice implementation of balanced tree?

Bart Demoen

2005-02-28, 8:59 pm

tmp123 wrote:
> Yes, I agree with your analisis.
>
> However, in case of random numbers, it seems a few better:



Random input doesn't tell you anything about worst case complexity.


> PS: Sugestion of a nice implementation of balanced tree?


Wirth, Niklaus - Algorithms + Datastructures = Programs.
Englewood Cliffs, Prentice-Hall, 1976

Cheers

Bart Demoen
student

2005-03-01, 8:58 pm

Bart Demoen wrote:
> student wrote:
>
>
>
>
> I disagree: unique3 makes sense on any list, not just sorted lists.
> It is just a matter of carefully phrasing it - talking procedurally:
>
> unique3 removes duplicates when they are adjacent
>
> ?- unique3([a,a,b,c,c,a,a,a,d,a,x],L).
>
> L = [x, a, d, a, c, b, a]
>
> [unique3 also reverses the order, but that is not essential]
>
> Trying to phrase it in a style you used about unique2, I would say
>
> unique3 perfectly describes the class 'lists without adjacent
> duplicates' in terms of the class 'lists'
>


Aha!

Altogether very interesting and not a lesson I am likely to soon
forget. :)


> Once it is obvious what unique3 means as a relation, it is also clear
> that
>
> unique2 == sort + unique3
>
> as relations. I think that without making use of what unique3 is, one
> can't prove the above equality.
>
>
[color=darkred]
>
>
> One might wish that optimal simplicity results in optimality of other
> criteria like complexity or performance, but it just seems not true.
> There is something too simple about unique2 and maybe yes, some cruel
> Law of Nature punishes solutions that are too simple. The same is true
> for sorting: a simple sorting method like insertion sort is punished;
> permutation sort is even more simple to write and closer to a logic
> specification, and it sucks performance wise.
>


>
>
>
> Maybe - the non-deterministic Turing machine does away (partly)
> with sequentiality and you get (non-deterministic) polynomial
> behaviour for "simple" programs - but the price you pay is that the
> notion of complexity has changed drastically.
>


Hmm, 'non-deterministic Turing machine'.

What is that and in what way does it change the notion of complexity?

>
>
>
> Sure there is, even better than O(n*log(n)) time !
> Represent a collection with 3 b's and 2 a's and 5 c's as
>
> [b/3, a/2, c/5]
>
> and unique-ifying can be done in O(n). But it comes at a cost for
> supporting other operations like having non-ground terms that you
> can unify later, add an element or delete the most recently added
> element ...
> [this might all make no sense if you have a particular meaning of
> "orderly" that I don't know]
>
>


I think I had a differet meaning of "dupicates" in mind.

If

[b/3, a/2, c/5, b/3]

were the input to the procedure I had in mind,

[b/3, a/2, c/5]

might be the output.


> Here is another example of the same fenomenon: TSP is NP-complete, but
> I can build a representation of a graph that lets me solve TSP in
> constant time ... only, if you give me nodes and edges incrementally
> then building this representation takes me non-poly time (or I would
> be very rich by now :-).
>
>


I will have to look up TSP, NP-complete in an encyclopedia.

Thank you!
--
billh
student

2005-03-02, 4:03 pm

Bart Demoen wrote:
> student wrote:
>
>
>
>
> I disagree: unique3 makes sense on any list, not just sorted lists.
> It is just a matter of carefully phrasing it - talking procedurally:
>
> unique3 removes duplicates when they are adjacent
>
> ?- unique3([a,a,b,c,c,a,a,a,d,a,x],L).
>
> L = [x, a, d, a, c, b, a]
>
> [unique3 also reverses the order, but that is not essential]
>
> Trying to phrase it in a style you used about unique2, I would say
>
> unique3 perfectly describes the class 'lists without adjacent
> duplicates' in terms of the class 'lists'
>



Aha!!!


> Once it is obvious what unique3 means as a relation, it is also clear
> that
>
> unique2 == sort + unique3
>
> as relations. I think that without making use of what unique3 is, one
> can't prove the above equality.
>
>


Beautiful.

[color=darkred]
>


(Please excuse the purple prose.)

>
> One might wish that optimal simplicity results in optimality of other
> criteria like complexity or performance, but it just seems not true.


> There is something too simple about unique2 and maybe yes, some cruel
> Law of Nature punishes solutions that are too simple. The same is true
> for sorting: a simple sorting method like insertion sort is punished;
> permutation sort is even more simple to write and closer to a logic
> specification, and it sucks performance wise.
>
>


Maybe, instead of bitching about the fact that 'unique2' is
orders of magnitude slower than 'my_msort + unique3', I should
be celebrating the fact that it is possible to transform an
obvious "starter" definition like 'unique2' into one that is
logically equivalent but orders of magnitude faster.

But these are worst case comparisons, right?

If I compare 'unique2' and 'my_msort + unique' on lists of
random integers -- which could arguably be said to be more
representative of data in the real world -- something
interesting happens:

?- rlist(10000,100,L),time( ((unique2(L,Q))) ).
% 510,254 inferences, 0.89 CPU in 0.88 seconds (101% CPU, 573319 Lips)

L = [99, 59, 10, 6, 14, 52, 32, 9, 18|...]
Q = [55, 70, 27, 87, 43, 34, 36, 5, 11|...]


?- rlist(10000,100,L),time( ((my_msort(L,S),unique3(S,Q))) ).
% 567,008 inferences, 2.42 CPU in 2.56 seconds (95% CPU, 234301 Lips)

L = [6, 38, 25, 15, 8, 64, 60, 53, 76|...]
S = [0, 0, 0, 0, 0, 0, 0, 0, 0|...]
Q = [99, 98, 97, 96, 95, 94, 93, 92, 91|...]

-- and these timings are not flukes (more on this below).


>
>
> Maybe - the non-deterministic Turing machine does away (partly)
> with sequentiality and you get (non-deterministic) polynomial
> behaviour for "simple" programs - but the price you pay is that the
> notion of complexity has changed drastically.
>


Is that bad?

What is 'non-deterministic Turing machine' and in what way does
it change the notion of complexity?

>
>
>
> Sure there is, even better than O(n*log(n)) time !
> Represent a collection with 3 b's and 2 a's and 5 c's as
>
> [b/3, a/2, c/5]
>
> and unique-ifying can be done in O(n). But it comes at a cost for
> supporting other operations like having non-ground terms that you
> can unify later, add an element or delete the most recently added
> element ...
> [this might all make no sense if you have a particular meaning of
> "orderly" that I don't know]
>
>


Plus the net cost of constructing it, which have be somewhat
more than the cost of directly applying unique2 to (say)

[c,b,a,c,c,a,b,c,b,c]

wouldn't it?

But, yes, that is pretty much what I had in mind, so suppose I
generalize that idea a little bit, and instead of just counting
the number of times a term occurs in the input list, I record
the positions where it occurs,e.g.,

[1,2,3,2,1,2,3,4,3,2,5,2] => [1/[1,5], 2/[2,4,6,10,12], 3/[3,7,9],
4/[8], 5/[11] ]

If I let

make_structure(L,M) :- distrib(1,L,[],M).

distrib(_,[],In,In).
distrib(K,[X|Xs],In,Out) :-
insert(X/K,In,Next),
K1 is K + 1,
distrib(K1,Xs,Next,Out).

insert(X/K,[X/KK|Rest],[X/[K|KK]|Rest]) :- !.
insert(X/K,[Y|Rest],[Y|Out]) :-
!,
insert(X/K,Rest,Out).
insert(X/K,[],[X/[K]]).

so that, for example,

?- rlist(10,5,L),time( ((make_structure(L,Q))) ).
% 50 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)

L = [1, 3, 0, 4, 1, 2, 3, 4, 1|...]
Q = [1/[9, 5, 1], 3/[7, 2], 0/[3], 4/[8, 4], 2/[10, 6]]

then, typically,

?- rlist(10000,100,L),time( ((make_structure(L,Q))) ).
% 523,432 inferences, 3.08 CPU in 3.22 seconds (96% CPU, 169945 Lips)

L = [87, 44, 6, 20, 70, 70, 83, 30, 35|...]
Q = [87/[9966, 9950, 9930, 9818, 9541, 9510, 9201|...], 44/[9935,
9872, 9861, 9815, 9786, 9729|...], 6/[9947, 9813, 9743, 9736, 9484|...],
20/[9943, 9887, 9862, 9660|...], 70/[9909, 9888, 9775|...], 83/[9984,
9842|...], 30/[9995|...], 35/[...|...], ... /...|...]

which I think compares quite favorably with

?- rlist(10000,100,L),time( ((my_msort(L,S),unique3(S,Q))) ).
% 567,465 inferences, 2.41 CPU in 2.56 seconds (94% CPU, 235463 Lips)

L = [41, 49, 35, 87, 43, 46, 93, 91, 36|...]
S = [0, 0, 0, 0, 0, 0, 0, 0, 0|...]
Q = [99, 98, 97, 96, 95, 94, 93, 92, 91|...]

and seems to have the added advantage of being able to handle
considerably longer input lists (e.g. N=100000) than 'my_msort'
can handle without crashing.

For example,

?- rlist(100000,100,L),time( ((make_structure(L,Q))) ).
% 5,248,761 inferences, 35.76 CPU in 37.64 seconds (95% CPU, 146777 Lips)

L = [86, 58, 0, 50, 74, 31, 82, 29, 38|...]
Q = [86/[99784, 99667, 99663, 99651, 99537, 99522, 99412|...],
58/[99921, 99529, 99491, 99425, 99226, 99223|...], 0/[99956, 99951,
99805, 99767, 99711|...], 50/[99872, 99807, 99788, 99695|...],
74/[99995, 99912, 99798|...], 31/[99945, 99692|...], 82/[99896|...],
29/[...|...], ... /...|...]

while

?- rlist(100000,100,L),time( ((my_msort(L,S),unique3(S,Q))) ).
% 2,202,144 inferences, 37.91 CPU in 39.94 seconds (95% CPU, 58089 Lips)
ERROR: Out of trail stack


So, to the extent that random lists of integers model data
acquisition in the real world, I think the fact that on that
kind of data 'unique2' does better than 'my_msort + unique3',
and 'make_structure' does almost as well as 'my_msort + unique'
and uses considerably fewer scarce computing resources in the
process, indicates that a purely abstract worst-case analysis
can be somewhat misleading.

> Here is another example of the same fenomenon: TSP is NP-complete, but
> I can build a representation of a graph that lets me solve TSP in
> constant time ... only, if you give me nodes and edges incrementally
> then building this representation takes me non-poly time (or I would
> be very rich by now :-).
>
>


Altogether very interesting and not a lesson I am likely to soon
forget. :)

Thank you.

--
billh






Salvador Fandiņo

2005-03-02, 4:03 pm

student wrote:

> Bart Demoen wrote:
>
>
> Plus the net cost of constructing it, which have be somewhat
> more than the cost of directly applying unique2 to (say)
>
> [c,b,a,c,c,a,b,c,b,c]
>
> wouldn't it?


no, it wouldn't, only if you use the wrong algorithm and data
structures :-)


> But, yes, that is pretty much what I had in mind, so suppose I
> generalize that idea a little bit, and instead of just counting
> the number of times a term occurs in the input list, I record
> the positions where it occurs,e.g.,
>
> [1,2,3,2,1,2,3,4,3,2,5,2] => [1/[1,5], 2/[2,4,6,10,12], 3/[3,7,9],
> 4/[8], 5/[11] ]
>
> If I let
>
> make_structure(L,M) :- distrib(1,L,[],M).
>
> distrib(_,[],In,In).
> distrib(K,[X|Xs],In,Out) :-
> insert(X/K,In,Next),
> K1 is K + 1,
> distrib(K1,Xs,Next,Out).
>
> insert(X/K,[X/KK|Rest],[X/[K|KK]|Rest]) :- !.
> insert(X/K,[Y|Rest],[Y|Out]) :-
> !,
> insert(X/K,Rest,Out).
> insert(X/K,[],[X/[K]]).
>
> so that, for example,
>
> ?- rlist(10,5,L),time( ((make_structure(L,Q))) ).
> % 50 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
>
> L = [1, 3, 0, 4, 1, 2, 3, 4, 1|...]
> Q = [1/[9, 5, 1], 3/[7, 2], 0/[3], 4/[8, 4], 2/[10, 6]]
>
> then, typically,
>
> ?- rlist(10000,100,L),time( ((make_structure(L,Q))) ).
> % 523,432 inferences, 3.08 CPU in 3.22 seconds (96% CPU, 169945 Lips)
>
> L = [87, 44, 6, 20, 70, 70, 83, 30, 35|...]
> Q = [87/[9966, 9950, 9930, 9818, 9541, 9510, 9201|...], 44/[9935,
> 9872, 9861, 9815, 9786, 9729|...], 6/[9947, 9813, 9743, 9736, 9484|...],
> 20/[9943, 9887, 9862, 9660|...], 70/[9909, 9888, 9775|...], 83/[9984,
> 9842|...], 30/[9995|...], 35/[...|...], ... /...|...]
>
> which I think compares quite favorably with
>
> ?- rlist(10000,100,L),time( ((my_msort(L,S),unique3(S,Q))) ).
> % 567,465 inferences, 2.41 CPU in 2.56 seconds (94% CPU, 235463 Lips)
>
> L = [41, 49, 35, 87, 43, 46, 93, 91, 36|...]
> S = [0, 0, 0, 0, 0, 0, 0, 0, 0|...]
> Q = [99, 98, 97, 96, 95, 94, 93, 92, 91|...]
>
> and seems to have the added advantage of being able to handle
> considerably longer input lists (e.g. N=100000) than 'my_msort'
> can handle without crashing.
>
> For example,
>
> ?- rlist(100000,100,L),time( ((make_structure(L,Q))) ).
> % 5,248,761 inferences, 35.76 CPU in 37.64 seconds (95% CPU, 146777 Lips)
>
> L = [86, 58, 0, 50, 74, 31, 82, 29, 38|...]
> Q = [86/[99784, 99667, 99663, 99651, 99537, 99522, 99412|...],
> 58/[99921, 99529, 99491, 99425, 99226, 99223|...], 0/[99956, 99951,
> 99805, 99767, 99711|...], 50/[99872, 99807, 99788, 99695|...],
> 74/[99995, 99912, 99798|...], 31/[99945, 99692|...], 82/[99896|...],
> 29/[...|...], ... /...|...]
>
> while
>
> ?- rlist(100000,100,L),time( ((my_msort(L,S),unique3(S,Q))) ).
> % 2,202,144 inferences, 37.91 CPU in 39.94 seconds (95% CPU, 58089 Lips)
> ERROR: Out of trail stack


The last implementation of my_msort we saw from you was buggy
leaving choice points behind. I have run the same test with a
mergesort I have implemented and it doesn't crash:

?- rlist(100000, 100, L),
(time(make_structure(L, O));time(mergesort(L, O))).
% 5,257,937 inferences, 55.86 CPU in 57.10 seconds (98% CPU,
94127 Lips)

L = [10, 48, 58, 19, 75, 42, 43, 92, 33|...]
O = [10/[99827, 99686, 99676, 99222, 99169, 99084, 98938|...],
48/[99968, 99790, 99783, 99453, 99260, 99083|...], 58/[99467,
99185, 99135, 99036, 98908|...], 19/[99984, 99742, 99630,
99605|...], 75/[99964, 99905, 99863|...], 42/[99900, 99481|...],
43/[99837|...], 92/[...|...], ... /...|...] ;

% 7,546,301 inferences, 89.34 CPU in 90.33 seconds (99% CPU,
84467 Lips)

L = [10, 48, 58, 19, 75, 42, 43, 92, 33|...]
O = [0, 0, 0, 0, 0, 0, 0, 0, 0|...]

Yes
?-


>
> So, to the extent that random lists of integers model data
> acquisition in the real world, I think the fact that on that
> kind of data 'unique2' does better than 'my_msort + unique3',
> and 'make_structure' does almost as well as 'my_msort + unique'
> and uses considerably fewer scarce computing resources in the
> process, indicates that a purely abstract worst-case analysis
> can be somewhat misleading.


As misleading as an special-case analysis. The rlist/3 predicate
generates lists with elements from a finite dictionary (0..99).
Under this restriction unique2/2 and make_structure/2 become O(N)
algorithms and so they are faster that the ones based on sorting
that continue to be O(NlogN).

There is another algorithm that performs well in both situations:
convert the list to an AVL tree and then back to a list (I have
implemented it as unique3/2 and changed unique2/2 to use a pure
prolog version of member/2 for a fair comparison):

?- rlist(100000, 100, L), time(unique2(L, O1)), time(unique3(L, O2)).
% 15,146,647 inferences, 10.65 CPU in 10.78 seconds (99% CPU,
1422220 Lips)
% 2,825,663 inferences, 7.76 CPU in 7.96 seconds (97% CPU, 364132
Lips)

L = [26, 71, 30, 20, 34, 84, 40, 51, 32|...]
O1 = [18, 52, 27, 78, 65, 90, 81, 42, 79|...]
O2 = [0-0, 1-1, 2-2, 3-3, 4-4, 5-5, 6-6, 7-7, ... -...|...]

Yes
?- rlist(100000, 1000, L), time(unique2(L, O1)), time(unique3(L,
O2)).
% 149,788,120 inferences, 103.30 CPU in 104.18 seconds (99% CPU,
1450030 Lips)
% 4,509,200 inferences, 12.06 CPU in 12.36 seconds (98% CPU,
373897 Lips)

L = [476, 705, 876, 712, 293, 386, 15, 274, 748|...]
O1 = [737, 590, 684, 515, 69, 113, 111, 385, 19|...]
O2 = [0-0, 1-1, 2-2, 3-3, 4-4, 5-5, 6-6, 7-7, ... -...|...]

Yes
?- rlist(100000, 3000, L), time(unique2(L, O1)), time(unique3(L,
O2)).
% 443,369,624 inferences, 319.46 CPU in 355.58 seconds (90% CPU,
1387872 Lips)
% 5,312,667 inferences, 16.99 CPU in 17.66 seconds (96% CPU,
312694 Lips)

L = [1784, 340, 242, 92, 659, 866, 1305, 292, 1986|...]
O1 = [811, 434, 1516, 2002, 304, 1702, 2802, 231, 1921|...]
O2 = [0-0, 1-1, 2-2, 3-3, 4-4, 5-5, 6-6, 7-7, ... -...|...]


Cheers,

- Salva


---------------------------------------------------
/* -*- Mode:Prolog -*- */

:- module(mergesort, [mergesort/2, rlist/3, make_structure/2,
unique2/2, unique3/2]).

:- use_module(avl).

mergesort(A, B) :-
mergesort1(A, B1),
!,
B=B1.

mergesort1([], []) :- !.
mergesort1([A], [A]) :- !.
mergesort1(I, O) :-
divide(I, I1, I2),
mergesort1(I1, O1),
mergesort1(I2, O2),
my_merge(O1, O2, O).

divide(L, A, B) :-
div1(L, L, A, B).

div1([], L, [], L).
div1([_|T], [H|L], [H|A], B) :-
div2(T, L, A, B).

div2([], L, [], L).
div2([_|T], L, A, B) :-
div1(T, L, A, B).

my_merge([], B, B) :- !.
my_merge(L, [], L) :- !.
my_merge([H1|T1], [H2|T2], [H|T]) :-
( H2 @< H1
-> H = H2,
my_merge([H1|T1], T2, T)
; H = H1,
my_merge(T1, [H2|T2], T)).


make_structure(L,M) :-
distrib(1,L,[],M).

distrib(_,[],In,In).
distrib(K,[X|Xs],In,Out) :-
insert(X/K,In,Next),
K1 is K + 1,
distrib(K1,Xs,Next,Out).

insert(X/K,[X/KK|Rest],[X/[K|KK]|Rest]) :- !.
insert(X/K,[Y|Rest],[Y|Out]) :-
!,
insert(X/K,Rest,Out).
insert(X/K,[],[X/[K]]).

rlist(0, _, []) :- !.
rlist(N, Max, [H|T]) :-
H is random(Max),
N1 is N-1,
rlist(N1, Max, T).

list_to_avl([], T, T).
list_to_avl([H|L], T, O) :-
avl_put(T, H, H, T1),
list_to_avl(L, T1, O).

unique3(A, B) :-
list_to_avl(A, t, T),
avl_to_list(T, B).

unique2(A, B) :-
unique2(A, [], B).

mymember(E, [H|T]) :-
( E=H
-> true
; mymember(E, T) ).

unique2([], B, B).
unique2([H|T], I, O) :-
( mymember(H, I)
-> unique2(T, I, O)
; unique2(T, [H|I], O)).

Bart Demoen

2005-03-02, 8:58 pm

student wrote:

> What is 'non-deterministic Turing machine' and in what way does
> it change the notion of complexity?



Comp.lang.prolog is not the right place to go into this.
You could check wikipedia for instance, and more specific
newsgroups are devoted to it as well.

> [...]


Neither do I wish to go along the worst-case, average-case,
random-case, realistic-case path: programmers should be aware of
these cases, the difference between them and in the end chose
the solution (algorithm) which fits "best" their needs/context.
There is no single generally correct and best solution.

The answer by Salvador (which I didn't check in detail) seems to
addresses some relevant issues.

I just want to end with a warning (once more, and you might object):
do not draw conclusions too hastily from a few timings.

was the test case representative ?
was the code used "as good as possible/reasonable" ?
was the test case as X as I intended ?
(X = random, or realistic, or average, or bad ...)
were the effects of memory management included/excluded/ignored?
do I care about complexity or constant factors ?
...

Cheers

Bart Demoen
Bart Demoen

2005-03-02, 8:58 pm

Salvador Fandiņo wrote:
[...]

What I want to say is not really a reply to what Salvador wrote, just a
follow-up on the thread as a whole.

This thread started from a guenuine question/worry and developed into
.... well, you saw what it became.

I wish, I wish ... that the participants in this thread, or some of the
readers of this thread, make a summary document of what was discussed,
and what it resulted in as insights (or lessons not to be forgotten
soon :-)

If nothing else, it can serve as a reference document which the
comp.lang.prolog FAQ points at; but it might also turn out to become
something publishable in a workshop in declarative programming teaching
or whatever else.

I don't want to be involved necessarily - I mainly want to encourage
people to do something :-)

Cheers

Bart Demoen
student

2005-03-03, 8:57 am

Salvador Fandiņo wrote:
> student wrote:
>
>
>
>
>
> no, it wouldn't, only if you use the wrong algorithm and data structures
> :-)
>


Right.

But given the fastest possible way of picking out the distinct
terms from a list, without keeping count of the number of times
each of those distinct terms occurred, it seems to me that
keeping track of the respective numbers of occurrences of those
terms *in addition to* finding them must involve more work and
therefore take more time.

Lips)[color=darkred]
>
>
> The last implementation of my_msort we saw from you was buggy leaving
> choice points behind. I have run the same test with a mergesort I have
> implemented and it doesn't crash:
>
> ?- rlist(100000, 100, L),
> (time(make_structure(L, O));time(mergesort(L, O))).
> % 5,257,937 inferences, 55.86 CPU in 57.10 seconds (98% CPU, 94127 Lips)
>
> L = [10, 48, 58, 19, 75, 42, 43, 92, 33|...]
> O = [10/[99827, 99686, 99676, 99222, 99169, 99084, 98938|...],
> 48/[99968, 99790, 99783, 99453, 99260, 99083|...], 58/[99467, 99185,
> 99135, 99036, 98908|...], 19/[99984, 99742, 99630, 99605|...],
> 75/[99964, 99905, 99863|...], 42/[99900, 99481|...], 43/[99837|...],
> 92/[...|...], ... /...|...] ;
>
> % 7,546,301 inferences, 89.34 CPU in 90.33 seconds (99% CPU, 84467 Lips)
>
> L = [10, 48, 58, 19, 75, 42, 43, 92, 33|...]
> O = [0, 0, 0, 0, 0, 0, 0, 0, 0|...]
>
> Yes
>
>


Ok, make that

>


> As misleading as a special-case analysis.


> The rlist/3 predicate
> generates lists with elements from a finite dictionary (0..99). Under
> this restriction unique2/2 and make_structure/2 become O(N) algorithms
> and so they are faster that the ones based on sorting that continue to
> be O(NlogN).
>


Oh, oh.

(Here, 'my_msort' is your 'mergesort' and 'unique3' is my
'unique3'.)

?- rlist(100000,100,L), time( (unique2(L, O1)) ),
time((my_msort(L,S),unique3(S,O2))).
% 5,164,224 inferences, 8.86 CPU in 8.86 seconds (100% CPU, 582870 Lips)
% 7,746,857 inferences, 77.70 CPU in 78.26 seconds (99% CPU, 99702 Lips)

?- rlist(100000,1000,L), time( (unique2(L, O1)) ),
time((my_msort(L,S),unique3(S,O2))).
% 49,920,011 inferences, 86.37 CPU in 93.50 seconds (92% CPU, 577979 Lips)
% 7,754,465 inferences, 77.30 CPU in 79.39 seconds (97% CPU, 100316 Lips)

?- rlist(100000,3000,L), time( (unique2(L, O1)) ),
time((my_msort(L,S),unique3(S,O2))).
% 148,167,120 inferences, 256.43 CPU in 257.85 seconds (99% CPU, 577807
Lips)
% 7,754,788 inferences, 81.23 CPU in 81.78 seconds (99% CPU, 95467 Lips)


Also,

?- rlist(100000, 100, L), time(make_structure(L, O2)).
% 5,247,665 inferences, 48.57 CPU in 52.91 seconds (92% CPU, 108043 Lips)

?- rlist(100000, 1000, L), time(make_structure(L, O2)).
% 50,102,633 inferences, 482.78 CPU in 533.91 seconds (90% CPU, 103779 Lips)

?- rlist(100000, 3000, L), time(make_structure(L, O2)).
% 147,909,022 inferences, 1492.62 CPU in 1780.39 seconds (84% CPU, 99094
Lips)

Egad.

What was I thinking?

If I had simply asked why it was that 'unique2' appeared to
do better than, and 'make_structure' and almost as well as,
'my_msort + unique' on lists of integers randomly selected from
the range (0..99), I'd be dancing on the table now!

Ah, well, live and learn.

Do I have the picture?

-----------------------------------------------------------
CPU
-----------------------------------------------------------
n k unique2 make_structure my_msort+unique3
-----------------------------------------------------------
100000 100 10 50 80
100000 1000 100 500 80
100000 3000 300 1500 80
----------------------------------------------------------

For given n, 'unique2' is O(k/10) and 'make_structure' is
O(5*(k/10)), where k is the size of the dictionary, while
increasing k has very little effect on 'my_msort + unique3'?

Why should that be?

Let me think about that for a minute.

Why should it take 'unique2' longer to eliminate duplicates
from a random list of integers in the range [1..1000], say,
than from a random list of integers in the range [1..10]?

Aha. Of course. The greater the number of different random
choices possible, the greater will be the number of different
random choices made, so that if the range is [0..9] the length
of the list of unique elements that 'unique2' has to check
before it admits a new member can be at most 10, but if the
range is [1..999], that list may contain as many as 1000
different elements.

But what is special about (0..99)?

Why do "unique2/2 and make_structure/2 become O(N) algorithms
and [thus] ... faster than the ones based on sorting that
continue to be O(NlogN)" when the range is (0..99)?


> There is another algorithm that performs well *** in both situations:

***

"both situations" meaning lists of elements randomly selected
from a finite dictionary versus lists of elements randomly
selected from an infinite dictionary, I take it.

> convert the list to an AVL tree and then back to a list (I have
> implemented it as unique3/2 and changed unique2/2 to use a pure prolog
> version of member/2 for a fair comparison):
>
> ?- rlist(100000, 100, L), time(unique2(L, O1)), time(unique3(L, O2)).
> % 15,146,647 inferences, 10.65 CPU in 10.78 seconds (99% CPU, 1422220

Lips)
> % 2,825,663 inferences, 7.76 CPU in 7.96 seconds (97% CPU, 364132 Lips)
>
>
> ?- rlist(100000, 1000, L), time(unique2(L, O1)), time(unique3(L, O2)).
> % 149,788,120 inferences, 103.30 CPU in 104.18 seconds (99% CPU, 1450030
> Lips)
> % 4,509,200 inferences, 12.06 CPU in 12.36 seconds (98% CPU, 373897 Lips)
>
> ?- rlist(100000, 3000, L), time(unique2(L, O1)), time(unique3(L, O2)).
> % 443,369,624 inferences, 319.46 CPU in 355.58 seconds (90% CPU, 1387872
> Lips)
> % 5,312,667 inferences, 16.99 CPU in 17.66 seconds (96% CPU, 312694 Lips)
>


Not only that, but your 'unique3' appears to be roughly 4 times as fast
as the sort-based method:

?- rlist(100000,3000,L), time( (unique2(L, O1)) ),
time((my_msort(L,S),unique3(S,O2))).
% 148,167,120 inferences, 256.43 CPU in 257.85 seconds (99% CPU, 577807
Lips)
% 7,754,788 inferences, 81.23 CPU in 81.78 seconds (99% CPU, 95467 Lips)

Whew.

Altogether an extremely interesting lesson and one that I am not
likely /ever/ to forget! :)

Thank you.

Bill

p.s.

I would have liked to experiment with your

> unique3(A, B) :-
> list_to_avl(A, t, T),
> avl_to_list(T, B).


> list_to_avl([], T, T).
> list_to_avl([H|L], T, O) :-
> avl_put(T, H, H, T1),
> list_to_avl(L, T1, O).
>


but I was unable to locate source code for 'avl_put/4' and 'avl_to_list/2'.

Henrik

2005-03-03, 3:58 pm

Bart Demoen <bmd@cs.kuleuven.ac.be> wrote in message news:<1109626589.505367@seven.kulnet.kuleuven.ac.be>...
> tmp123 wrote:
>
>
> Random input doesn't tell you anything about worst case complexity.
>


Does random input equal input distributed according to the uniform
distribution?
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com