For Programmers: Free Programming Magazines  


Home > Archive > APL > October 2007 > APL2007 update









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author APL2007 update
Mike Kent

2007-10-03, 7:57 am

APL 2007 in Montreal (only 2 1/2 ws away, Oct 20-22).

Summary program information is now available
on the APL2007 home page

http://www.sigapl.org/apl2007.html

with a link to the comprehensive program description at

http://www.sigapl.org/apl2007-program.html#a2


Registration for APL2007 is at

http://www.regmaster.com/conf/oopsla2007.html
Beliavsky

2007-10-03, 7:57 am

I wonder if any APL programmers have tried Fortran 90 or a later
version of the language. F90 introduced array operations into the
language, which is something APL is known for. Considering that
Fortran compilers are more widely available than APL interpreters,
what are the main advantages of the latter that keep some people using
it? What about APL vs. another interpreted language such as Octave of
Python+Numpy?

The repeated announcements of an APL conference in comp.lang.fortran
are what "provoked" me to ask this question.

Michael Metcalf

2007-10-03, 6:58 pm


"Beliavsky" <beliavsky@aol.com> wrote in message
news:1191415778.387994.323090@n39g2000hsh.googlegroups.com...
>I wonder if any APL programmers have tried Fortran 90 or a later
> version of the language. F90 introduced array operations into the
> language, which is something APL is known for. Considering that
> Fortran compilers are more widely available than APL interpreters,
> what are the main advantages of the latter that keep some people using
> it? What about APL vs. another interpreted language such as Octave of
> Python+Numpy?
>

At the time I briefly used APL, around 1981, its fans claimed that, as an
interpreted language, it could be used very effectively to develop
algorithms. For production purposes, these would then be translated into
Fortran. So, the one did not exclude the other.

Regards,

Mike Metcalf


glarocque@cfl.forestry.ca

2007-10-03, 6:58 pm

On Oct 3, 8:49 am, Beliavsky <beliav...@aol.com> wrote:
> I wonder if any APL programmers have tried Fortran 90 or a later
> version of the language. F90 introduced array operations into the
> language, which is something APL is known for. Considering that
> Fortran compilers are more widely available than APL interpreters,
> what are the main advantages of the latter that keep some people using
> it? What about APL vs. another interpreted language such as Octave of
> Python+Numpy?
>
> The repeated announcements of an APL conference in comp.lang.fortran
> are what "provoked" me to ask this question.


The announcement of an APL conference is APL2007, and it is organized
by SIGAPL.
This conference will focus on Array Programming Languages, not only on
the
traditional APL. So users of Fortran 90, J, Octave and Python and
others Array
Programming Languages are more than welcome to come to Montreal and
participate
in the debate. It has been the policy of SIGAPL to promote links among
users
of all Array Programming Languages for several years.

Guy Larocque
SIGAPL Chairman


Martin Neitzel

2007-10-03, 6:58 pm

beliavsky@aol.com wrote:
> I wonder if any APL programmers have tried Fortran 90 or a later
> version of the language.


Bob Bernecky did so, and he wrote a fine description why F90 is
still miles away from APL. I'll dig out the exact APL QQ reference
tomorrow. (IIRC the main point was that F90 has a rather constrained
set of array operations; REDUCE exists is limited to a few primitive
arithmetic operations only.)

Martin Neitzel
jk

2007-10-03, 6:58 pm


"Beliavsky" <beliavsky@aol.com> wrote in message
news:1191415778.387994.323090@n39g2000hsh.googlegroups.com...
>I wonder if any APL programmers have tried Fortran 90 or a later
> version of the language. F90 introduced array operations into the
> language, which is something APL is known for. Considering that
> Fortran compilers are more widely available than APL interpreters,
> what are the main advantages of the latter that keep some people using
> it? [ ... snipped]


Yes, I did. As an exercise I re-coded Phil Benkard's SUMROUND-function
(APL91 - Stanford) in Fortran90. The function was rounding of values to
a given number of decimals (or rather powers of 10) in the left arg, with the
requirement that the sum of the rounded off values should exactly equal the
rounded off sum of the bare values (e.g. as required in financial reports).
Executing the Fortran90 on 10000 values gave me time not only to get
and drink coffee, but even to collect and burn the beans.
That's why I stuck to APL.

An other funny story is that in some project the "programmer's team" wanted
to see Fortran code, while the work was already done in APL by the responsible
engineers satisfactorily (funny? silly? but true!).
The APL-people decided to write an APL code that produced the thing in
Fortran, so, they had their Fortran code (APL Toronto '93 - it's on video tape).
But why? How would you ever write, test and debug 20.000 lines of Fortran?
S.E & O.

Jan Karman
http://www.ganuenta.com



jk

2007-10-04, 3:57 am


"jk" <*axy*@planet.nl> wrote in message
news:4704113d$1$25500$ba620dc5@text.nova.planet.nl...
>
> "Beliavsky" <beliavsky@aol.com> wrote in message
> news:1191415778.387994.323090@n39g2000hsh.googlegroups.com...
>
> Yes, I did. As an exercise I re-coded Phil Benkard's SUMROUND-function
> (APL91 - Stanford) in Fortran90. The function was rounding of values to
> a given number of decimals (or rather powers of 10) in the left arg, with the
> requirement that the sum of the rounded off values should exactly equal the
> rounded off sum of the bare values (e.g. as required in financial reports).
> Executing the Fortran90 on 10000 values gave me time not only to get
> and drink coffee, but even to collect and burn the beans.
> That's why I stuck to APL.
>
> An other funny story is that in some project the "programmer's team" wanted
> to see Fortran code, while the work was already done in APL by the responsible
> engineers satisfactorily (funny? silly? but true!).
> The APL-people decided to write an APL code that produced the thing in
> Fortran, so, they had their Fortran code (APL Toronto '93 - it's on video
> tape).
> But why? How would you ever write, test and debug 20.000 lines of Fortran?
> S.E & O.
>


As Phil Benkard put it that time in his paper 'A Dance of Rounds' litteraly:
"Rounding off is a bit of lying."
How much are you lying? In the platina grey pre-computer days my actuarial
science courses provided a few simple lessons in numerical analysis on the
topics of 'shorthand multiplication and division'. The aim was to know how much
exactly you were lying in rounding off. E.g. 23.456 x 53.14, in which 53.14 had
been somewhere between 53.135xx and 53.144xx, myriads of those manually
with the comptometer or sumlock, later by electrical machines. Not surprising
that Phil's routine is still one of my favourite subjects - in fact it's a
beauty! I'll remember it "when I'm 94 ...", because Phil's function did exactly
what I had learned far back in the past.

The fact that the APL routine consisted of say 30 characters (including the
header, line-numbering and both the del's, that is) and the Fortran90 code
needed two pages A4 is, in the meantime, a corny, if not a stale remark.
Reasoning backwards, it's a very instructive excercise to take a sheet of paper
and pencil and construct a table with a real example, following the process in
Phil's APL function (backwards of course, each primitive a column) in order to
see what's happening. This indepth paper & pencil analysis is indispensible
when coding in Fortran(90).
By the way, web-teaching, which is poisoning our students these days, will never
cross this sort of exercises, although they are crucial in learning.

Re the Fortran90 routine: I had to cancel the process at five by <Ctrl>+C.
The APL-function took a few seconds (on a VAX/VMS 750 system in VAX APL).

You might wonder how K is performing in this funtion. In K, you rather talk
about an array of say 500,000 or 6,000,000 items (remember that a middle-sized
life-insurance office is a billion half penny's business).
An always attractive, intriguing, what do I say, seducing property of this sort
of APL-routines is that they often can be mirrored into K (I'm now talking
about the body of the function). The characters will be different but the
primitive functions are the same. Here is the K-routine:

f:{x*(_ y)+(<>d)<-0.5+/d:(y:y%x)!1}

It's identical to Phil's APL routine, and you even may now reversively mirror to
APL from it, x and y being right and left arg respectively.
The sequence "<>" is the bottleneck for the Fortran code: you'll need the
indices from the grading up and down! They are "wheighing" the values to be
rounded, deciding whether it's going up or down. (Look at the paper with the
analysis).
The Fortran90 SORT routine only gives you the sorted array - who needs it?

I had expected the K-routine to performing sub-second on a (few) million items,
but it's still between one and two on my PC. That figures the tax of the
function to the CPU.

A J-function would probably look a bit different - I'm not familiar enough with
J to provide the routine. Eugene McDonnell will certainly have done so in one of
his "At Play with J".

Happy Fortranning!!
Jan Karman
http://www.ganuenta.com





Walter Spector

2007-10-04, 7:57 am

jk wrote:
> ...
> The sequence "<>" is the bottleneck for the Fortran code: you'll need the
> indices from the grading up and down! They are "wheighing" the values to be
> rounded, deciding whether it's going up or down. (Look at the paper with the
> analysis).
> The Fortran90 SORT routine only gives you the sorted array - who needs it?


I used DEC-10 APLSF a lot back in the 1970s. Also used the CDC 6000/Cyber
UMASS version some. Fun language!

Sadly, Fortran does not have a Standardized SORT routine. Some Fortran
implementations do offer entry points into the qsort(3c) routine.
But these are slow due to the necessary calls to the user-defined
comparison routine.

The High Performance Fortran (HPF) Standard had GRADE_UP and GRADE_DOWN
intrinsics defined. (I wonder where they got those names from? ;)
HPF 2.0 defined SORT_UP and SORT_DOWN intrinsics. However HPF is mostly
ignored by vendors these days. Even the good parts.

W.
dpb

2007-10-04, 6:58 pm

jk wrote:
> "Beliavsky" <beliavsky@aol.com> wrote in message
> news:1191415778.387994.323090@n39g2000hsh.googlegroups.com...
>
> Yes, I did. As an exercise I re-coded Phil Benkard's SUMROUND-function
> (APL91 - Stanford) in Fortran90. The function was rounding of values to
> a given number of decimals (or rather powers of 10) in the left arg, with the
> requirement that the sum of the rounded off values should exactly equal the
> rounded off sum of the bare values (e.g. as required in financial reports).
> Executing the Fortran90 on 10000 values gave me time not only to get
> and drink coffee, but even to collect and burn the beans.

....

Accepting it took more source code, it would seem the actual operations
required would end up roughly the same on the same hardware and similar
levels of optimization...what would reasonably say otherwise that APL
can do the same processing in significantly fewer machine operations?

--
jk

2007-10-04, 6:58 pm


"dpb" <none@non.net> wrote in message news:fe2tf3$739$1@aioe.org...
> jk wrote:
> ...
>
> Accepting it took more source code, it would seem the actual operations
> required would end up roughly the same on the same hardware and similar levels
> of optimization...what would reasonably say otherwise that APL can do the same
> processing in significantly fewer machine operations?
>


This is a bit beyond my scope, but I'd say that the bottleneck for Fortran lies
in the sorting, the APL-{grade-up} {grade-down}, in other words a different
approach of sorting. In that operation APL gives you the indices for the array
to be sorted, while Fortran gives - by a runtime routine - the sorted array.
Please, see also the later postings.
Sorting is a big thing in computing. Performance may stand or fall with the
sorting algorithm. As far as I know, e.g. Dyalog uses Hoare's
algorithm for sorting - the fastest one known at present (I was told by somebody
who is supposed to know).




Dan Nagle

2007-10-04, 6:58 pm

Hello,

jk wrote:

> This is a bit beyond my scope, but I'd say that the bottleneck for Fortran lies
> in the sorting, the APL-{grade-up} {grade-down}, in other words a different
> approach of sorting. In that operation APL gives you the indices for the array
> to be sorted, while Fortran gives - by a runtime routine - the sorted array.


Well, there is no Fortran sort. So the design
of the external sort routine is determined by the author
of the external sort routine.

Also, the amount of memory traffic (I presume that's the issue
you're implying here) will be the same for sorting the indices,
usually Fortran default integers, as for any other data type
requiring one numeric storage unit.

Perhaps there should be a standard Fortran sort, but as of the present,
there isn't.

--

Dan Nagle
Purple Sage Computing Solutions, Inc.
dpb

2007-10-04, 6:58 pm

jk wrote:
> "dpb" <none@non.net> wrote in message news:fe2tf3$739$1@aioe.org...
>
> This is a bit beyond my scope, but I'd say that the bottleneck for Fortran lies
> in the sorting, the APL-{grade-up} {grade-down}, in other words a different
> approach of sorting. In that operation APL gives you the indices for the array
> to be sorted, while Fortran gives - by a runtime routine - the sorted array.
> Please, see also the later postings.
> Sorting is a big thing in computing. Performance may stand or fall with the
> sorting algorithm. As far as I know, e.g. Dyalog uses Hoare's
> algorithm for sorting - the fastest one known at present (I was told by somebody
> who is supposed to know).


I would say that is essentially unrelated to the language itself then.
Whatever internal implementation of a SORT intrinsic isn't mandated by
the Fortran Standard. I don't know enough of APL to know whether it
would be so that all implementations would use the same algorithm or not.

Either way, whether any particular algorithm is optimal for sorting is
dependent on the data and I don't believe there's any single fastest in
a global sense.

But, it does answer the underlying question and the "Say _what_?" factor
of the previous posting, thanks...

--
Richard Maine

2007-10-04, 6:58 pm

Dan Nagle <dannagle@verizon.net> wrote:

> jk wrote:
>
[color=darkred]
> Well, there is no Fortran sort.


Which is to say that the criticism seems to have nothing to do with
Fortran, but rather with an author who either did not know Fortran or
chose not to use it. Doesn't sound like a very good basis for judgement
to me.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
dpb

2007-10-04, 6:58 pm

dpb wrote:
> jk wrote:
>
> I would say that is essentially unrelated to the language itself then.
> Whatever internal implementation of a SORT intrinsic isn't mandated by
> the Fortran Standard. ...


Actually, there isn't a SORT intrinsic at all, what I was thinking of is
a compiler-specific routine that is an extension of the particular
compiler I happen to use most frequently...

--
Bob Smith

2007-10-04, 6:58 pm

jk wrote:
[...]
> Sorting is a big thing in computing. Performance may stand or fall with the
> sorting algorithm. As far as I know, e.g. Dyalog uses Hoare's
> algorithm for sorting - the fastest one known at present (I was told by somebody
> who is supposed to know).


I'd be surprised if Dyalog uses Hoare's algorithm (also known as
Quicksort) as it is not a stable sort.

--
________________________________________
_
Bob Smith -- bsmith@sudleydeplacespam.com

To reply to me directly, delete "despam".
jk

2007-10-04, 6:58 pm


"dpb" <none@non.net> wrote in message news:fe35d5$vtm$1@aioe.org...
> jk wrote:
>
> I would say that is essentially unrelated to the language itself then.
> Whatever internal implementation of a SORT intrinsic isn't mandated by the
> Fortran Standard. I don't know enough of APL to know whether it would be so
> that all implementations would use the same algorithm or not.
>
> Either way, whether any particular algorithm is optimal for sorting is
> dependent on the data and I don't believe there's any single fastest in a
> global sense.
>
> But, it does answer the underlying question and the "Say _what_?" factor of
> the previous posting, thanks...
>
> --


In the Fortran routine I used the external routine ('runtime routine') that came
with the compiler;
APL has it's own {grade-up}, {grade-down} - but we write 199x ? and I haven't
seen anything since. I had other APL-funtions that tumbled on "sorting" (not
really, but when benchmarked). Some authors wrote entire books on sorting
(Flores, Knuth, Lorin and others), so it must be something wothwhile to think of
....
All I can say is that it seems the system was apparantly working and working on
the sorting part. (It reminds me of the paper of Dan King on Arthur Whitney's,
while demo-ing his Kdb, where a manager said: "Well, I can do that too with my
system" - making his machines hang, in the meantime telling that he "really must
go by now" ...


jk

2007-10-04, 6:58 pm

Bob,

Pete Donnelly told me ... about 10 years ago.

"Bob Smith" <bsmith@sudleydeplacespam.com> wrote in message
news:jm9Ni.18340$B25.11725@news01.roc.ny...
> jk wrote:
> [...]
>
> I'd be surprised if Dyalog uses Hoare's algorithm (also known as Quicksort) as
> it is not a stable sort.
>
> --
> ________________________________________
_
> Bob Smith -- bsmith@sudleydeplacespam.com
>
> To reply to me directly, delete "despam".




glen herrmannsfeldt

2007-10-04, 6:58 pm

Bob Smith wrote:

(snip on APL and sorting)

> I'd be surprised if Dyalog uses Hoare's algorithm (also known as
> Quicksort) as it is not a stable sort.


You can always add the original position to the comparison function.
If you are generating the index vector, you start with an array with
the index values in it and rearrange that based on the comparisons.
In the case of a tie, use the index values to break the tie.

-- glen

Seth

2007-10-04, 6:58 pm

In article <4704113d$1$25500$ba620dc5@text.nova.planet.nl>,
jk <*axy*@planet.nl> wrote:

>Yes, I did. As an exercise I re-coded Phil Benkard's SUMROUND-function
>(APL91 - Stanford) in Fortran90. The function was rounding of values to
>a given number of decimals (or rather powers of 10) in the left arg, with the
>requirement that the sum of the rounded off values should exactly equal the
>rounded off sum of the bare values (e.g. as required in financial reports).


"Distributive rounding" I remember doing that in the late 1970's in
APL 5100. My sarcastic comment was about the bank's Board of
Directors being able to check if numbers added up.

Doing it in two dimensions makes life interesting.

Even in one dimension, the problem isn't quite so simple: is it more
harmful to round 1.4 up to 2, or 10.3 up to 11? Your inputs are 1.4,
10.3, and 30 X.01's.

Seth
Seth

2007-10-04, 6:58 pm

In article <47050218$0$25474$ba620dc5@text.nova.planet.nl>,
jk <*axy*@planet.nl> wrote:
>"dpb" <none@non.net> wrote in message news:fe2tf3$739$1@aioe.org...


>
>This is a bit beyond my scope, but I'd say that the bottleneck for Fortran lies
>in the sorting, the APL-{grade-up} {grade-down}, in other words a different
>approach of sorting. In that operation APL gives you the indices for the array
>to be sorted, while Fortran gives - by a runtime routine - the sorted array.


Actually, for this underlying problem, you don't need to sort. It can
be solved in linear time. (You're looking for the M largest out of N,
which can be done in O(N) comparisons.)

I've seen the algorithm. I'd rather sort.

Seth
dpb

2007-10-04, 6:58 pm

jk wrote:
> "dpb" <none@non.net> wrote in message news:fe35d5$vtm$1@aioe.org...
>
> In the Fortran routine I used the external routine ('runtime routine') that came
> with the compiler;
> APL has it's own {grade-up}, {grade-down} - but we write 199x ? and I haven't
> seen anything since. I had other APL-funtions that tumbled on "sorting" (not
> really, but when benchmarked). Some authors wrote entire books on sorting
> (Flores, Knuth, Lorin and others), so it must be something wothwhile to think of
> ...
> All I can say is that it seems the system was apparantly working and working on
> the sorting part. ...


Well, certainly sorting is a significant problem or why, as you say, are
there entire treatises about it?

But, that a particular implementation of a vendor-specific extension in
one compiler is or isn't suitable for a particular task is nothing
meaningful by which to compare languages.

--
jk

2007-10-04, 6:58 pm


"Seth" <sethb@panix.com> wrote in message news:fe3crq$dbq$1@reader1.panix.com...
> In article <4704113d$1$25500$ba620dc5@text.nova.planet.nl>,
> jk <*axy*@planet.nl> wrote:
>
>
> "Distributive rounding" I remember doing that in the late 1970's in
> APL 5100. My sarcastic comment was about the bank's Board of
> Directors being able to check if numbers added up.
>
> Doing it in two dimensions makes life interesting.
>
> Even in one dimension, the problem isn't quite so simple: is it more
> harmful to round 1.4 up to 2, or 10.3 up to 11? Your inputs are 1.4,
> 10.3, and 30 X.01's.
>
> Seth


Interesting. If I remember correctly, in the old days, you rounded off nearby,
then you did the addition of the new values, than you assessed the error and
finally you corrected the items with the relatively least difference, one by
one. That's in fact what Phil's function is doing, it's hardly comprehensible at
a first glance. You really need a simple example to see it.


James J. Weinkam

2007-10-04, 6:58 pm

glen herrmannsfeldt wrote:
> Bob Smith wrote:
>
> (snip on APL and sorting)
>
>
> You can always add the original position to the comparison function.
> If you are generating the index vector, you start with an array with
> the index values in it and rearrange that based on the comparisons.
> In the case of a tie, use the index values to break the tie.
>
> -- glen
>


This message has long lines that will display best with line wrap set to 100.


What you say is true, but stability is not the only issue. Quicksort's worst case performance is
n*2 not n{times}2{log}n; admittedly the worst case isn't very likely to occur and there are defenses
against it but it's still a blemish.

Also Quicksort's legendary performance applies primarily to its application to sorting an array of
numbers which is not the typical use of sorting in serious applications. The following is from a
post I made to the PL/I newsgroup when a similar topic was discussed there back in 2004:



Mark Yudkin wrote:

.... 5-1/2 pages giving the Microsoft implementation of Quicksort in C

Wow, 5-1/2 pages just for a quick sort routine and not very carefully checked.

Here is my transliteration of the C function:

* process mar(2,100); /* qsortmv */
qsortmv: proc(a,n,w,c) reorder;
/* jjw2004/06/13 -- Conversion of Microsoft's C qsort function */
/* call qsortmv(a,n,w,c); where a points to an array of n items of width w, */
/* c(p,q) returns -1,0, or 1 based on the relationship between the two items */
dcl
(n,w,s,sz) bin fixed(31),
(a,l,h,m,lg,hg,(sl,sh)(30)) ptr,
c entry(ptr,ptr) returns(bin fixed(31)) options(byvalue);
if n<2|w=0 then return;
s=1; l=a; h=a+w*(n-1);
recurse: sz=(h-l)/w+1;
if sz<=8 then call short(l,h,w,c);
else do;
m=l+(sz/2)*w; call swap(m,l,w);
lg=l; hg=h+w;
do forever;
do forever; lg+=w; if lg<=h then if c(lg,l)<=0 then; else leave; else leave; end;
do forever; hg-=w; if hg>l then if c(hg,l)>=0 then; else leave; else leave; end;
if hg<lg then leave;
call swap(lg,hg,w);
end;
call swap(l,hg,w);
if hg-w-l>=h-hg then do;
if l+w<hg then do; sl(s)=l; sh(s)=hg-w; s+=1; end;
if lg<h then do; l=lg; goto recurse; end;
end;
else do;
if lg<h then do; sl(s)=lg; sh(s)=h; s+=1; end;
if l+w<hg then do; h=hg-w; goto recurse; end;
end;
end;
s-=1; if s>0 then do; l=sl(s); h=sh(s); goto recurse; end;
short: proc(l,h,w,c) options(byvalue);
dcl (l,h,p,m) ptr, w bin fixed(31),
c entry(ptr,ptr) returns(bin fixed(31)) options(byvalue);
do while(h>l); m=l;
do p=l+w repeat(p+w) while(p<=h); if c(p,m)>0 then m=p; end;
call swap(m,h,w); h-=w;
end;
end short;
swap: proc(a,b,w) options(byvalue);
dcl (a,b) ptr, w bin fixed(31), (c based,t) char(1);
if a~=b then do while(w>0); t=a->c; a->c=b->c; b->c=t; a+=1; b+=1; w-=1; end;
end swap;
end qsortmv;


The real question is how good is this as a general internal sort?

Another question is why is nearly everyone so enamored with Quicksort?

Certainly it is an elegant algorithm. Its average performance is also faster
than most other methods, at least in the textbook setting of sorting an array
of integers.

On the other hand it has some serious drawbacks: Its worst case performance is
O(n**2) and it is not stable. (A sorting method is stable if it does not
permute items that compare equal.)

The reason that it performs so well relative to other methods when sorting an
array of integers is that comparison of two integers and movement of an integer
from one array element to another both take similar amounts of time to that of other
integer operations used in subscript manipulation. When this is coupled with the
fact that Quicksort performs very few subscript manipulations compared to other
sorting methods, the reason why it outperforms other methods when sorting an
integer array, even though the other methods make fewer comparisons and/or data
movements becomes clear.

However most real applications are sorting items which can be thought of as
consisting of two parts: a KEY upon which the comparison of items depends and the
associated DATA. In many if not most cases, the total size of an item is
considerably larger than an integer and the algorithm to compare two KEYs is
more complex than comparing two integers or reals.

To demonstrate the effect of these considerations, I prepared two arrays and
applied several sorting routines to each of them.

The first array consists of 1000000 random bin fixed(31) integers uniformly
distributed in the interval (0,2147483647). For this array, the comparison
criterion is a simple integer comparison.

The second array consists of 1000000 80 byte items. Each item consists of three
bin fixed(31) integers followed by 68 bytes of text. The three integers are
independent random variables. The first two are in the interval [0,99] and the
third is in the interval [0,99999]. For this array, the comparison criterion
is lexicographic ordering on the three integers in left to right order.

The sorting algorithms compared are:

qsortmv: The transliteration of Microsoft's C qsort shown above using byvalue;
qsortmr: The same routine, but using call by reference;
qsortmp: Same as qsortmr but replacing the loop in swap with a call to PLIMOVE;

qsorta: My own implementation, modified to sort the array instead of an array of
pointers;

sorta: My preferred array sorter which uses an auxiliary link array to sort using
binary merge, then rearranges the array in place;

sortl: My binary merge list sorter applied to linked lists containing the same data.

Since no i/o is involved, timing of the sorts is very reproducible. The times do not
vary by more than +- .01 sec. All tests were run on a Thinkpad 600 with no other
applications running.

The following table summarizes the timing results and also shows the number of key
comparisons (C) and data movements (M) performed by each routine. For qsorta, the
theoretical average number of comparisons (A) is shown. For sorta, the theoretical
maximum number of comparisons (X) is shown; this value also applies to sortl. Finally
the number of fixed elements (F) and number of cycles (Y) of the stable permutation are
shown; these values apply to sorta and sortl but not necessarily to the Quicksort
methods since they are unstable and find a different permutation in general unless
there are no equal elements.


4 byte elements 80 byte elements

qsortmv: T 3.63 19.86
qsortmr: T 4.30 17.96
qsortmp: T 5.60 13.47
C 25603868 25773956
M 14723739 14714256

qsorta T 3.57 11.15
C 24760597 25390212
M 10060710 10039911
A 25505599

sorta T 7.34 10.33
C 18715816 18715915
M 1000011 1000015
X 18931570

sortl T 5.16 5.91

F 1 0
Y 12 15


Discussion: Comparison of the times for sortl shows that the more complex comparison has minimal
effect. On the other hand, increasing the item size to a realistic value has a drastic effect on
the relative performance of the various routines. Call by value is faster than call by reference
for one word items, whereas call by reference is faster for 80 byte items. For four byte items, the
calling overhead of PLIMOVE leads to an increased running time of about 1.3 sec while for 80 byte
items there is a 4.5 sec decrease. qsorta is uniformly better than the more involved implementation
based on the Microsoft routine even though it only treats size 2 as a special case. Even though
sorta does far more subscript arithmetic than any of the Quicksort versions, the smaller number of
key comparisons and greatly reduced number of data moves makes it the clear winner among the array
sorters for the 80 byte data. If an an array is not necessary, sortl is the method of choice, since
it eliminates data movement entirely.

glen herrmannsfeldt

2007-10-04, 6:58 pm

James J. Weinkam wrote:
(snip regarding non-stable quicksort, I wrote)

[color=darkred]
> What you say is true, but stability is not the only issue. Quicksort's
> worst case performance is n*2 not n{times}2{log}n; admittedly the worst
> case isn't very likely to occur and there are defenses against it but
> it's still a blemish.


It is, especially since some of the slow cases occurs when the array
is already in order, or close to being in order.

> Also Quicksort's legendary performance applies primarily to its
> application to sorting an array of numbers which is not the typical use
> of sorting in serious applications. The following is from a post I made
> to the PL/I newsgroup when a similar topic was discussed there back in
> 2004:


(snip)

> Another question is why is nearly everyone so enamored with Quicksort?


> Certainly it is an elegant algorithm. Its average performance
> is also faster than most other methods, at least in the textbook
> setting of sorting an array of integers.

(snip)

> The reason that it performs so well relative to other methods
> when sorting an array of integers is that comparison of two integers
> and movement of an integer from one array element to another both
> take similar amounts of time to that of other integer operations
> used in subscript manipulation. When this is coupled with the
> fact that Quicksort performs very few subscript manipulations
> compared to other sorting methods, the reason why it outperforms
> other methods when sorting an integer array, even though the other
> methods make fewer comparisons and/or data movements becomes clear.


For internal sort there is also the cache to consider. I don't know
how much effect that has on quicksort, though.

> However most real applications are sorting items which can be thought
> of as consisting of two parts: a KEY upon which the comparison of items
> depends and the associated DATA. In many if not most cases, the total
> size of an item is considerably larger than an integer and the
> algorithm to compare two KEYs is more complex than comparing two
> integers or reals.


That is true. When I learned about sort algorithms there were three
fast sorts: Quicksort, Shellsort, and Heapsort. Actually, I don't
know so well the details on comparisons vs. data movement for the
three. All three are considered NlogN typical if not worst case.

For C's qsort (which, despite the name, isn't necessarily
quicksort) many people sort a list of pointers instead of the data
array itself. Reasonably often I sort the data array, but usually
it is small enough that I am not really worried about the time.

For the case that started this thread, though, it was a list of
indexes into the array. In that case, the data movement is the
same as for an integer array and it is easy to make a stable
sort.

(big snip)

-- glen

Dr Ivan D. Reid

2007-10-04, 6:58 pm

On Thu, 04 Oct 2007 21:59:29 GMT, James J. Weinkam <jjw@cs.sfu.ca>
wrote in <5ndNi.117826$Pd4.1744@edtnps82>:

> This message has long lines that will display best with line wrap set to 100.


Why? I adjust my text windows for height, rarely for width. 80
columns has been "standard" width for decades now. Assuming anything else is
arrogance of the first water!

--
Ivan Reid, School of Engineering & Design, _____________ CMS Collaboration,
Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".
wclodius@lanl.gov

2007-10-04, 6:58 pm

On Oct 4, 9:08 am, "jk" <*a...@planet.nl> wrote:
> <snip>
>
> This is a bit beyond my scope, but I'd say that the bottleneck for Fortran lies
> in the sorting, the APL-{grade-up} {grade-down}, in other words a different
> approach of sorting. In that operation APL gives you the indices for the array
> to be sorted, while Fortran gives - by a runtime routine - the sorted array.
> Please, see also the later postings.
> Sorting is a big thing in computing. Performance may stand or fall with the
> sorting algorithm. As far as I know, e.g. Dyalog uses Hoare's
> algorithm for sorting - the fastest one known at present (I was told by somebody
> who is supposed to know).


I would hope that Dyalog doesn't use Quicksort. FWIW the somebody
doesn't know.

For finite (machine) integers (and I believe also floats) there are
sorting algorithms of order 0(n). Quicksort's average sorting for
randomly distributed orders of values in the array is 0(n log(n)).
However Quicksort's worst performance (0(n^2)) occurs for arrays that
are already sorted (or almost sorted), and in practice those occur far
more often than would be predicted for random distributions. Quicksort
should not be used on arrays of machine integers or floating point
numbers.

For more complicated types (e.g. character strings, bignums, etc.)
where 0(n) sorting is not practical, the most commonly used
competitors to Quicksort are Heapsort and Mergesort. All three have
"average" 0(n log(n)) performance. Worst case for Heapsort and
Mergesort remains 0(n log(n)) as opposed to Quicksort's (n^2). Note
that Quicksort and Heapsort are largely sort in place algorithms, but
Mergesort can need significantly (~2-3*) more memory. Mergesort
(unlike Quicksort and Heapsort) is stable (preserves relative order of
identical items). The relative performance of these algorithms depends
on how the data is stored (array, linked list, tree), and the relative
costs of comparisons versus memory moves. Modifications of all three
algorithms can change the coefficients, often minimizing memory moves.
Most analyses (and tests) use (comparison and memory) operations
appropriate for machine integers (where the 0(n) methods are more
appropriate), on randomly ordered data. In these tests Quicksort is
usually faster than Heapsort, but slightly slower than Mergesort. If
either stability or performance is more important than minimizing
memory usage, (a sophisticated) Mergesort is to be preferred, if
minimizing memory usage is more important and if Quicksort's worst
case is expected to be uncommon Quicksort is to be preferred,
otherwise Heapsort is to be preferred.

jk

2007-10-05, 3:57 am

Another "little gem" with {grade-up} is in meshing two strings.

In APL:
r <- s[{grade-up}{grade-up} f]
in which s = the two catenated strings and f is the known final order

In K:
r <- s[<<f]

a very powerful utility

jk


Harold Stevens

2007-10-05, 6:58 pm

["Followup-To:" header set to comp.lang.fortran.]

In <slrnfgarsm.b3s.Ivan.Reid@loki.brunel.ac.uk> Dr Ivan D. Reid:

[Snip...]

> 80 columns has been "standard" width for decades now


Indeed. FWIW...

I rarely read any post(er) with runon lines past 80 columns. Already quite
enough noise without that oblivious "style" intruding as well.

JMO; YMMV...

--
Regards, Weird (Harold Stevens) * IMPORTANT EMAIL INFO FOLLOWS *
Pardon any bogus email addresses (wookie) in place for spambots.
Really, it's (wyrd) at airmail, dotted with net. DO NOT SPAM IT.
Kids jumping ship? Looking to hire an old-school type? Email me.
Mike Kent

2007-10-06, 6:57 pm

Bob Smith wrote:

> I'd be surprised if Dyalog uses Hoare's algorithm (also known as
> Quicksort) as it is not a stable sort.


Non-stable sorts /are/ stable if all the items in the
to-be-sorted data are distinct -- so, a fast but non-stable
/sort/ can underly a fast and stable /grade/ (by doing a
lexicographic sort on pairs (datum . index)).

In a grade algorithm the extra work need not be /very/
onerous since

(a) you are permuting the indices so they are easily
at hand already

(b) you only have to do the comparison of original
indices in the case where the data values match


For the case of "enough" things to grade, the balance may
favor fancy efficient non-stable sorts over less efficient
but stable sorts.


Engineering questions that might be asked include

how many is enough?

how dreadful is the penalty for smaller cases
(is it small enough to be neglected)?

is there a crossover point a where you get
back most of the penalty in the not-so-many
cases, and if so where (not forgetting
the cost of the size check)?

what's the arg size distribution likely to
be seen in use?



IAC Dyalog use some kind of modern sort or so I was told
once by Pete or, more likely, someone from their interpreter
implementation team .
TC

2007-10-07, 6:57 pm

On Oct 4, 2:57 pm, se...@panix.com (Seth) wrote:
> Actually, for this underlying problem, you don't need to sort. It can
> be solved in linear time. (You're looking for the M largest out of N,
> which can be done in O(N) comparisons.)
>
> I've seen the algorithm. I'd rather sort.
>
> Seth


actually, that non sorting algorithm might be pretty useful for me, as
I've got a case where I'm scanning across a 9.6 million row by 4
column matrix and sorting that might take too long [sub 200
milliseconds] . . . which if i can be clever about that tail-recursion-
reuses-the-stack feature of dyalog i might be able to achieve

could you point me to a citation for that algorithm

phil chastney

2007-10-08, 3:57 am

wclodius@lanl.gov wrote:
> On Oct 4, 9:08 am, "jk" <*a...@planet.nl> wrote:
>
> I would hope that Dyalog doesn't use Quicksort. FWIW the somebody
> doesn't know.
>
> For finite (machine) integers (and I believe also floats) there are
> sorting algorithms of order 0(n). Quicksort's average sorting for
> randomly distributed orders of values in the array is 0(n log(n)).
> However Quicksort's worst performance (0(n^2)) occurs for arrays that
> are already sorted (or almost sorted), and in practice those occur far
> more often than would be predicted for random distributions. Quicksort
> should not be used on arrays of machine integers or floating point
> numbers.
>
> For more complicated types (e.g. character strings, bignums, etc.)
> where 0(n) sorting is not practical, the most commonly used
> competitors to Quicksort are Heapsort and Mergesort. All three have
> "average" 0(n log(n)) performance. Worst case for Heapsort and
> Mergesort remains 0(n log(n)) as opposed to Quicksort's (n^2). Note
> that Quicksort and Heapsort are largely sort in place algorithms, but
> Mergesort can need significantly (~2-3*) more memory. Mergesort
> (unlike Quicksort and Heapsort) is stable (preserves relative order of
> identical items). The relative performance of these algorithms depends
> on how the data is stored (array, linked list, tree), and the relative
> costs of comparisons versus memory moves. Modifications of all three
> algorithms can change the coefficients, often minimizing memory moves.
> Most analyses (and tests) use (comparison and memory) operations
> appropriate for machine integers (where the 0(n) methods are more
> appropriate), on randomly ordered data. In these tests Quicksort is
> usually faster than Heapsort, but slightly slower than Mergesort. If
> either stability or performance is more important than minimizing
> memory usage, (a sophisticated) Mergesort is to be preferred, if
> minimizing memory usage is more important and if Quicksort's worst
> case is expected to be uncommon Quicksort is to be preferred,
> otherwise Heapsort is to be preferred.


FWIW, my experience is that the execution times of internal sort
routines is rarely a significant factor in the overall timing

.. . . unless, of course, I had to write and debug the damn thing
before I could use it

/phil
phil chastney

2007-10-08, 3:57 am

jk wrote:
> Another "little gem" with {grade-up} is in meshing two strings.
>
> In APL:
> r <- s[{grade-up}{grade-up} f]
> in which s = the two catenated strings and f is the known final order
>
> In K:
> r <- s[<<f]
>
> a very powerful utility


plus the fact that grade-up and grade-down can handle character arrays
with equal ease . . . /phil
jk

2007-10-08, 3:57 am


"phil chastney" <phil.hates.spam@amadeus.munged.eclipse.co.uk> wrote in message
news:U1lOi.755909$Bo7.463152@fe07.news.easynews.com...
> jk wrote:
>
> plus the fact that grade-up and grade-down can handle character arrays with
> equal ease . . . /phil


yes, datatype and even the shape of the array is irrelevant - wonderful!


Chip Coldwell

2007-10-09, 6:58 pm

"James J. Weinkam" <jjw@cs.sfu.ca> writes:

>
> Here is my transliteration of the C function:


Here's my version of quicksort in Fortran:

! Fortran 95 implementation of the quicksort algorithm. This
! subroutine does not directly touch the array it sorts; rather it
! relies on the two callbacks "compare" and "exchange" for that.
! Code inspired by R. Sedgewick, "Algorithms in C".

subroutine quicksort(n, compare, exchange)
integer, intent(in) :: n ! the length of the implied array

! The compare function must return an integer that is
! greater than zero if element(i) > element(j)
! equal to zero if element(i) = element(j)
! less than zero if element(i) < element(j)
interface
integer function compare(i,j)
integer, intent(in) :: i, j
end function compare
end interface

! The exchange subroutine exchanges the elements at locations i and j.
interface
subroutine exchange(i,j)
integer, intent(in) :: i, j
end subroutine exchange
end interface

call subfile(1,n)
return

contains

recursive subroutine subfile(l,r)
integer, intent(in) :: l, r
integer :: i

if (r .le. l) return
i = partition(l,r)
call subfile(l,i-1)
call subfile(i+1,r)
end subroutine subfile

integer function partition(l,r)
integer, intent(in) :: l, r
integer :: i, j

i = l
j = r - 1
do
do while (compare(i,r) .lt. 0)
i = i+1
end do
do while (compare(j,r) .gt. 0)
if (j .eq. l) exit
j = j-1
end do
if (i .ge. j) exit
call exchange(i,j)
end do
call exchange(i,r)
partition = i
end function partition
end subroutine quicksort

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc.
10 Technology Park Drive
Westford, MA 01886
978-392-2426
glen herrmannsfeldt

2007-10-09, 6:58 pm

Chip Coldwell wrote:
(snip)

> Here's my version of quicksort in Fortran:


(snip)

> recursive subroutine subfile(l,r)
> integer, intent(in) :: l, r
> integer :: i
> if (r .le. l) return
> i = partition(l,r)
> call subfile(l,i-1)
> call subfile(i+1,r)
> end subroutine subfile


Despite the obvious convenience it is usual to write quicksort
not using explicit recursion. That is, it is done with an internal
array to save the values that need to saved.

It is also usual to stack the larger half remaining and process the
smaller half. Doing that reduces the stack requirement (either
explicit recursion or built in array) to log2(N).

I don't see any of the methods to reduce N**2 behavior. Without
that, and without the stack preservation above, it is easy
to achieve N levels of recursion. Especially without checking
for only one element remaining before the recursive call.
(Some use a random number to select the pivot point, others
take the middle valued of the first, center, and last point.)

As I understand it, it is faster to not sort the smaller partitions
with quicksort, but to leave them unsorted and run an
insertion sort after the quicksort. I don't know how popular
that one is, though.

Otherwise, it is nice to see the ability to sort an arbitrary
data structure by having the caller supply the compare and
exchange routines. It will be nice when generic pointers are
implemented so one can pass the data to the compare and exchange
routines more conveniently.

-- glen

robert.corbett@sun.com

2007-10-09, 6:58 pm

On Oct 4, 4:26 pm, wclod...@lanl.gov wrote:

> I would hope that Dyalog doesn't use Quicksort. FWIW the somebody
> doesn't know.
>
> For finite (machine) integers (and I believe also floats) there are
> sorting algorithms of order 0(n). Quicksort's average sorting for
> randomly distributed orders of values in the array is 0(n log(n)).
> However Quicksort's worst performance (0(n^2)) occurs for arrays that
> are already sorted (or almost sorted), and in practice those occur far
> more often than would be predicted for random distributions. Quicksort
> should not be used on arrays of machine integers or floating point
> numbers.
>
> For more complicated types (e.g. character strings, bignums, etc.)
> where 0(n) sorting is not practical, the most commonly used
> competitors to Quicksort are Heapsort and Mergesort. All three have
> "average" 0(n log(n)) performance. Worst case for Heapsort and
> Mergesort remains 0(n log(n)) as opposed to Quicksort's (n^2). Note
> that Quicksort and Heapsort are largely sort in place algorithms, but
> Mergesort can need significantly (~2-3*) more memory. Mergesort
> (unlike Quicksort and Heapsort) is stable (preserves relative order of
> identical items). The relative performance of these algorithms depends
> on how the data is stored (array, linked list, tree), and the relative
> costs of comparisons versus memory moves. Modifications of all three
> algorithms can change the coefficients, often minimizing memory moves.
> Most analyses (and tests) use (comparison and memory) operations
> appropriate for machine integers (where the 0(n) methods are more
> appropriate), on randomly ordered data. In these tests Quicksort is
> usually faster than Heapsort, but slightly slower than Mergesort. If
> either stability or performance is more important than minimizing
> memory usage, (a sophisticated) Mergesort is to be preferred, if
> minimizing memory usage is more important and if Quicksort's worst
> case is expected to be uncommon Quicksort is to be preferred,
> otherwise Heapsort is to be preferred.


There are many implementations of quicksort with different
performance characteristics. The problem of quicksort going
quadratic on almost ordered or almost reverse ordered arrays
is well-known and is easily eliminated. There are
implementations of quicksort for which the worst-case
execution time is O(n log(n)). The book *Introduction to
Algorithms* by Cormen, Leiserson, and Rivest gives this as
an exercise:

10.3-3
Show how quicksort can be made to run in O(n lg n) time
in the worst case.

Bob Corbett

glen herrmannsfeldt

2007-10-09, 6:58 pm

robert.corbett@sun.com wrote:

(snip)

> 10.3-3
> Show how quicksort can be made to run in O(n lg n) time
> in the worst case.


The two methods that I know about are:

1) Use random selection for the pivot point.

2) Choose the middle of three (usually first, center, and last) points.

Both still leave the possibility of O(N**2) time, but make it
very unlikely as N increases. In the random case, if one knew
the seed one could set up worst case conditions. For the second
case one wouldn't need to know the seed, though the worst case
should be at least slightly better than O(N**2).

There are other improvements I suggested in a previous post.

-- glen

Seth

2007-10-09, 9:57 pm

In article <1191961201.116361.270520@v3g2000hsg.googlegroups.com>,
<robert.corbett@sun.com> wrote:

>There are many implementations of quicksort with different
>performance characteristics. The problem of quicksort going
>quadratic on almost ordered or almost reverse ordered arrays
>is well-known and is easily eliminated. There are
>implementations of quicksort for which the worst-case
>execution time is O(n log(n)). The book *Introduction to
>Algorithms* by Cormen, Leiserson, and Rivest gives this as
>an exercise:
>
> 10.3-3
> Show how quicksort can be made to run in O(n lg n) time
> in the worst case.


Define "quicksort".

Run quicksort for n lg n time; if it isn't finished, use an n log n
sort instead. (Time is at most 2 n lg n.)

Use the linear median-finder to get the partitioning element. (As a
side effect, you also know which elements are larger and smaller.)

Take the barometer to the building superintendant, and tell him . . .

Seth
Chip Coldwell

2007-10-10, 3:57 am

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

> Chip Coldwell wrote:
> (snip)
>
>
> (snip)
>
>
> Despite the obvious convenience it is usual to write quicksort
> not using explicit recursion. That is, it is done with an internal
> array to save the values that need to saved.
>
> It is also usual to stack the larger half remaining and process the
> smaller half. Doing that reduces the stack requirement (either
> explicit recursion or built in array) to log2(N).


OK, I think I'm up to the challenge:

! Fortran 95 implementation of the quicksort algorithm. This
! subroutine does not directly touch the array it sorts; rather it
! relies on the two callbacks "compare" and "exchange" for that.
! Code inspired by R. Sedgewick, "Algorithms in C" and correspondence
! with Glen Herrmannsfeldt.

subroutine quicksort(n, compare, exchange)
implicit none
integer, intent(in) :: n ! the length of the implied array

! The compare function must return an integer that is
! greater than zero if element(i) > element(j)
! equal to zero if element(i) = element(j)
! less than zero if element(i) < element(j)
interface
integer function compare(i,j)
integer, intent(in) :: i, j
end function compare
end interface

! The exchange subroutine exchanges the elements at locations i and j.
interface
subroutine exchange(i,j)
integer, intent(in) :: i, j
end subroutine exchange
end interface

integer, dimension(2, n) :: stack
integer :: sp, left, right, pivot

sp = 1
call push(1, n)

do while(pop(left, right))
if (left .ge. right) cycle
pivot = partition(left, right)
if (pivot .gt. (left+right)/2) then
call push(left,pivot-1)
call push(pivot+1,right)
else
call push(pivot+1,right)
call push(left,pivot-1)
end if
end do
return

contains

subroutine push(l, r)
integer, intent(in) :: l, r

stack(1, sp) = l
stack(2, sp) = r
sp = sp + 1
end subroutine push

logical function pop(l, r)
integer, intent(out) :: l, r

if (sp .gt. 1) then
sp = sp - 1
l = stack(1, sp)
r = stack(2, sp)
pop = .true.
else
pop = .false.
end if
end function pop

integer function partition(l,r)
integer, intent(in) :: l, r
integer :: i, j

i = l
j = r - 1
do
do while (compare(i,r) .lt. 0)
i = i+1
end do
do while (compare(j,r) .gt. 0)
if (j .eq. l) exit
j = j-1
end do
if (i .ge. j) exit
call exchange(i,j)
end do
call exchange(i,r)
partition = i
end function partition
end subroutine quicksort


> As I understand it, it is faster to not sort the smaller partitions
> with quicksort, but to leave them unsorted and run an
> insertion sort after the quicksort. I don't know how popular
> that one is, though.


I didn't attempt that.

> Otherwise, it is nice to see the ability to sort an arbitrary
> data structure by having the caller supply the compare and
> exchange routines. It will be nice when generic pointers are
> implemented so one can pass the data to the compare and exchange
> routines more conveniently.


Or, what is known in C++ circles as "generic programming".

Chip

--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"
Somerville, Massachusetts, New England
glen herrmannsfeldt

2007-10-10, 3:57 am

Seth wrote:

(someone wrote)

[color=darkred]
> Define "quicksort".


> Run quicksort for n lg n time; if it isn't finished, use an n log n
> sort instead. (Time is at most 2 n lg n.)


> Use the linear median-finder to get the partitioning element. (As a
> side effect, you also know which elements are larger and smaller.)


But the linear median finder is supposed to have n>32.

(See Knuth, "The Art of Computer Programming, v3, section 5.3.3.)

The result may be O(n log n) with a large coefficient if front.
The linear median finder seems to be 15n-163.

> Take the barometer to the building superintendant, and tell him . . .


That is a good way...

-- glen

Seth

2007-10-10, 6:58 pm

In article < VfednceS9Opm0ZHanZ2dneKdnZydnZ2d@comcast
.com>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
>Seth wrote:


>
>
>
>
>But the linear median finder is supposed to have n>32.


For limited n, everything is O(1).

>(See Knuth, "The Art of Computer Programming, v3, section 5.3.3.)
>
>The result may be O(n log n) with a large coefficient if front.
>The linear median finder seems to be 15n-163.


The version I learned was (IIRC) 3n + o(n) comparisons. The
bookkeeping may well have exceeded the comparison cost.

Seth
Morten Kromberg

2007-10-11, 3:57 am

On Oct 3, 3:53 pm, "Michael Metcalf" <michaelmetc...@compuserve.com>
wrote:

> "Beliavsky" <beliav...@aol.com> wrote in message
>
> At the time I briefly used APL, around 1981, its fans claimed that, as an
> interpreted language, it could be used very effectively to develop
> algorithms. For production purposes, these would then be translated into
> Fortran. So, the one did not exclude the other.
>
> Regards,
>
> Mike Metcalf


>From a Computer Scientist or Software Engineers point of view, it

makes sense to argue about how equivalent F90 and APL are. From the
users point of view, the comparison is close to meaningless. APL
interpreters allow you to dynamically load, inspect/mine and "play
with" arrays of data (and recently also arrays of objects). APL allows
the domain expert to interactively explore data and leads him or her
down the path of developing functional domain specific notations which
can easily be embedded in production applications.

F90 may be able to compete with APL (and win) in terms of raw
performance, IF you already know exactly what you want to do. But APL
is more typically used as a tool for exploring and creating
specifications, rather than a for "programming" in the modern software
engineering sense (the part of the work that is equivalent to pouring
concrete in a construction project). Some APL systems generate, or are
converted by hand into compiled languages once the problem domain is
fully understood (IF the domain is not in a constant state of flux).

The reasons for translating APL systems which have settled down into a
traditional language CAN be performance, but more often it is that a
large corporation wants to move maintenence into a more predictable
part of the organization (large groups of "cannon fodder" using
traditional programming languages which can be outsourced etc). Many
managers prefer to think of programs and programmers as bits of Lego,
that can be easily replaced - for rather obvious reasons, this feels
much more comfortable - and does actually make a lot of sense as the
creative forces can move on to tackling the next problem rather than
getting bogged down doing maintence and small extensions (and writing
flashy GUI and other stuff which adds a lot to the marketability but
little to the true value of the application).

In finance and a cutting edge engineering fields, the application
subject matter keeps mutating and APL remains the tool of choice for
production applications. There is an oil company who estimate that the
use of APL to adapt their refinery business to advances in chemical
engineering earns them roughly a billion dollars a year. If you want
to be able to adapt to changing market conditions in hours rather than
years, you want the domain experts directly involved in writing
production code. When overseen by managers who know what they are
doing, APL makes this a reality. The competitive edge that this
delivers varies from significant to unmeasurable, if you are in a
sufficiently dynamic marketplace.

The odd thing about APL is that (unlike most other tools for system
architects), the array notation (and the hard work done by the guys
who wrote the interpreters) very often leads to "executable specs"
which compete favourably with hand coded, highly optimised solutions
in compiled languages :-). So it even makes sense to argue about
sorting in F90 vs. APL.

My take on the important difference about the way APL does grading is
that you don't re-organize the data. You get the sort index (the
discusion about whether these are "pointers" is a bit of a red
herring, this is a perfectly safe operation). This allows you to
"instantaneously" switch between viewing the ordered and the unordered
data. You can do further manipulation of/using the sort order. You
have a lot more flexibility in your application this way. For example,
you can use the same index to re-order associated data "facts" on
demand, rather than being forced to re-order everything at once. In a
financial application with several hundred facts per instrument, where
the user can dynamically pick which facts he wants to see, this might
give you one to two orders of magnitude better response in your
application.

It is this sort of flexibility which allows many interpreted APL
systems written by people who are not really "programmers" to run
rings around hand coded code in compiled languages, where the
applications have been objectively decomposed by software engineers to
the point where the ability of domain experts to apply important
knowlege about the data is lost. OK, so a perfect design process is
supposed to extract this knowledge and use it in the design of the
object hiearchy.

More often that not, you can't see the data for all the objects :-)

Chip Coldwell

2007-10-11, 6:58 pm

Morten Kromberg <mkrom@dyalog.com> writes:

>
> There is an oil company who estimate that the use of APL to adapt
> their refinery business to advances in chemical engineering earns
> them roughly a billion dollars a year.


I've heard roughly the equivalent statement made for just about every
programming language on the planet. For a Lisp example:

http://www.paulgraham.com/avg.html

Chip

--
Charles M. "Chip" Coldwell
"Turn on, log in, tune out"
Somerville, Massachusetts, New England
Morten Kromberg

2007-10-11, 6:58 pm

On Oct 11, 4:47 pm, Chip Coldwell <coldw...@gmail.invalid> wrote:
> Morten Kromberg <mk...@dyalog.com> writes:
>
>
> I've heard roughly the equivalent statement made for just about every
> programming language on the planet. For a Lisp example:
>
> http://www.paulgraham.com/avg.html
>
> Chip
>
> --
> Charles M. "Chip" Coldwell
> "Turn on, log in, tune out"
> Somerville, Massachusetts, New England


I think "just about every programming language..." is perhaps a bit of
an overstatement, but I grant you that APL is by far the only one.
There are a number of languages which do generate this kind of
enthusiasm and claims from their users. LISP has been one. More
recently, users of Python and Ruby make these kinds of statements, in
particular about web applications.

APL is especially well suited to a particular type of application:
Very generally, I would describe them as applications with technical
or analytical content which favours a mathematical approach, and also
require the "organization" and "crunching" of decent quantities of
data.

It is true that you won't hear much about APL in Software Engineering
circles today. APL was much more visible in the 70's and 80's than it
is today, and this might give the impression that APL has become a
dead language. However, this is not the case. In the age where all
software was custom built and the agile / dynamic nature of APL lead
people to write everything from accounting packages to crew scheduling
systems in APL. As most of these fields settled down and became
dominated by big players with customizable packages built using
mainstream Software Engineering tools and techniques. And this is "how
it should be"; in the long term, anything written in APL is
susceptible to being taken over by tools which aim for predictability
and stability over new insight and productivity.

Many of the early uses of APL died with the mainframes on which they
ran, and APL vendors (being much like the users of APL: Problem-
rather than technology oriented) were a bit slow to pick up on the
technological revolutions that followed. However, in many financial
and technical applications, APL survived because of the enormous
advantages of having a language which many business and technical
experts find easier to learn and to use (both to read AND to write)
than traditional programming languages. Today, a number of excellent
workstation-based APL systems exist, with object orientation and tight
integration with platforms like .Net. Part of the software development
market is moving towards involving domain experts and developing
domain specific notations, techniques which have been used in APL
circles for 40 years.

Another reason why you won't hear much about APL in Software
Engineering circles is that most APL users don't see themselves as
Software Engineers, they present papers at conferences of chemical
engineers and build successful BI applications. If you ask many APL
programmers what they do for a living they will say they are actuaries
or portfolio managers - not "programmers". However, there are signs
that fashion is swinging in the direction of APL, it does seem that
the APL niche is starting to grow again, thanks to agile methodologies
and a growing awareness that traditional software engineering is WAY
too inefficient. Involving the domain experts is key to achieving REAL
increases in productivity (listen to what Charles Simonyi has been
saying recently). I'm not sure whether this trend will also benefit
LISP, FORTH and SmallTalk, to name a few languages which used to be
worshipped by their users in much the same way as APL still is. These
languages share many of the same characteristics as APL but I'm not
sure how easy they are for the quote non-programmers unquote to learn.

While the three languages I mentioned above can be used in CS courses
to illustrate "taking subjects taught in CS to extremes", APL sems to
be too pragmatic and result-oriented to be much use to Computer
Scientists. So you won't hear them talk much about it, either. But
back in 1975 Edsger Dijkstra did make a wonderful statement about the
language, which tends to put a big smile on the faces of most APL
users who hear it: "APL is a mistake, carried through to perfection.
It is the language of the future for the programming techniques of the
past: it creates a new generation of coding bums".

At the risk of repeating myself, what APL does seem to have going for
it is that it is both an easy language for domain experts (as opposed
to professional programmers, trained in mainstream languages) to
learn, AND a really good platform for building so-called "embedded
domain specific languages".

Morten

Richard Maine

2007-10-11, 6:58 pm

Morten Kromberg <mkrom@dyalog.com> wrote:

> So it even makes sense to argue about
> sorting in F90 vs. APL.


No it doesn't, because as pointed out here multiple times before, there
is no sorting in the Fortran language. And almost none of the
"discussion" in question in this thread seems to have anything at all to
do with Fortran. For example, most of the mention of pointers is rather
explicitly about the usual implementation of C pointers, and does not
apply to Fortran ones, which are very different.

> My take on the important difference about the way APL does grading is
> that you don't re-organize the data.


Well, that's just like the Fortran way, insomuch as the Fortran way is
whatever the author of the particular sort package wants it to be. I
have personally seen (and written, for that matter) sorting code in
Fortran that did exactly that; it isn't particularly unusual.

I don't really see the point of this whole "discussion". It has somewhat
the flavor of trolling for a fight, trying to turn this into some kind
of language war. If you want to talk about sorting, I suggest that there
are directly applicable language-independent groups for that. I see no
language issue worth discussing.

You don't really need to defend why you like and use APL in a Fortran
newsgroup. Actually, I'm sure that many of us accept that without need
for argument.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Morten Kromberg

2007-10-11, 6:58 pm

On Oct 11, 6:11 pm, nos...@see.signature (Richard Maine) wrote:
> Morten Kromberg <mk...@dyalog.com> wrote:
>
> No it doesn't, because as pointed out here multiple times before, there
> is no sorting in the Fortran language. And almost none of the
> "discussion" in question in this thread seems to have anything at all to
> do with Fortran.
>
> [Stuff Snipped]
>
> I don't really see the point of this whole "discussion". It has somewhat
> the flavor of trolling for a fight, trying to turn this into some kind
> of language war. If you want to talk about sorting, I suggest that there
> are directly applicable language-independent groups for that. I see no
> language issue worth discussing.
>
> You don't really need to defend why you like and use APL in a Fortran
> newsgroup. Actually, I'm sure that many of us accept that without need
> for argument.


I have to agree with the above. I guess I was getting tired of a
discussion which also seemed to have nothing with APL (despite the
subject line). My attempt to bring it back round to a discussion on
sorting in APL was probably misguided. I apologise to anyone who feel
that I have been wasting your time, and will do my best to resist the
temptation to dive back in :-)

Morten

jk

2007-10-11, 6:58 pm

>
> I have to agree with the above. I guess I was getting tired of a
> discussion which also seemed to have nothing with APL (despite the
> subject line). My attempt to bring it back round to a discussion on
> sorting in APL was probably misguided. I apologise to anyone who feel
> that I have been wasting your time, and will do my best to resist the
> temptation to dive back in :-)
>
> Morten


There's nothing to apologize.
I brought up the example of the sorting and I feel (a bit) responsible. That
example (Phil Benkard's ROUNDING OFF) was relevant, I think, and I was at that
time disappointed about F90, "sold" as an array language, which wasn't, in my
view, in the sense of the APL's than available.
My example on generating Fortran code with APL is historical (APL93 Toronto),
and I myself did some experimenting with it, for the same objective: in my
department they wanted "readable" code. I'm still happy I didn't finish it ...

The example on the merits of APL (case) you mentioned is correct and there's no
comparable example in any other language as far as I know, except maybe K/Kdb,
which is in fact an APL as well - both statements nobody would dispute.

Jan Karman







Seth

2007-10-11, 9:58 pm

In article <1192118493.661220.155040@o3g2000hsb.googlegroups.com>,
Morten Kromberg <mkrom@dyalog.com> wrote:

> As most of these fields settled down and became
>dominated by big players with customizable packages built using
>mainstream Software Engineering tools and techniques. And this is "how
>it should be"; in the long term, anything written in APL is
>susceptible to being taken over by tools which aim for predictability
>and stability over new insight and productivity.


That's anything _stable_. Stuff changes. I've seen what happens when
someone tries to translate out of an APL-based language for political
reasons. I told them not to.

>Another reason why you won't hear much about APL in Software
>Engineering circles is that


"'Software Engineering' - the term fills a necessary gap." -- David
Gries

Seth
robert.corbett@sun.com

2007-10-11, 9:58 pm

On Oct 9, 9:57 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> Seth wrote:
>
> (someone wrote)
>
>
> But the linear median finder is supposed to have n>32.
>
> (See Knuth, "The Art of Computer Programming, v3, section 5.3.3.)
>
> The result may be O(n log n) with a large coefficient if front.
> The linear median finder seems to be 15n-163.


The formula determines the exact number of comparisons that the
naive algorithm needs to do to find the median. Extending the
formula for small values would add complexity for no good
purpose. Note that at the end of the section, Knuth says that
a more sophisticated algorithm requires fewer comparisons.

Note that there is no need to find the median of the entire
subarray in order to ensure that quicksort remains O(n lg n).
If the partition element is chosen from any fixed percentage of
the elements of the subarray, the worst-case behavior is
O(n lg n). For example, if instead of the median-of-three
that is commonly used, the sort uses the median-of-one-percent,
the worst-case is O(n lg n).

Bob Corbett


glen herrmannsfeldt

2007-10-12, 3:57 am

robert.corbett@sun.com wrote:

(snip regarding quicksort)

> Note that there is no need to find the median of the entire
> subarray in order to ensure that quicksort remains O(n lg n).
> If the partition element is chosen from any fixed percentage of
> the elements of the subarray, the worst-case behavior is
> O(n lg n). For example, if instead of the median-of-three
> that is commonly used, the sort uses the median-of-one-percent,
> the worst-case is O(n lg n).


So if you use the next to the lowest or next to the highest
of the partition it is then O(n log n)? That is, the worst
case for median of three.

-- glen

robert.corbett@sun.com

2007-10-12, 6:58 pm

On Oct 11, 10:28 pm, glen herrmannsfeldt <g...@ugcs.caltech.edu>
wrote:
> robert.corb...@sun.com wrote:
>
> (snip regarding quicksort)
>
>
> So if you use the next to the lowest or next to the highest
> of the partition it is then O(n log n)? That is, the worst
> case for median of three.


The worst case for median-of-N, for any fixed N is O(n^2).
The worst case for median of-N-percent, where N is fixed
and 0 < N <= 100 is O(n lg n).

Bob Corbett

Martin Neitzel

2007-10-12, 6:58 pm

>beliavsky@aol.com wrote:
>
>Bob Bernecky did so, and he wrote a fine description why F90 is
>still miles away from APL. I'll dig out the exact APL QQ reference
>tomorrow.


Well, neither "tomorrow" nor "APL QQ" were correct but here's the
reference, at last:

Robert Bernecky: "Fortran 90 Arrays", ACM SIGPLAN Notices Vol.26 No.2,
Feb. 1991, pages 83--98

Martin Neitzel
glen herrmannsfeldt

2007-10-12, 6:58 pm

robert.corbett@sun.com wrote:

(snip on quicksort)

> The worst case for median-of-N, for any fixed N is O(n^2).
> The worst case for median of-N-percent, where N is fixed
> and 0 < N <= 100 is O(n lg n).


OK, yes. Somehow I didn't read it right the first time.

But it is also O(N log N) for median of 0.000000001%, too,
which shows that O() isn't a very good measure for real
problems.

-- glen

James J. Weinkam

2007-10-12, 9:58 pm

glen herrmannsfeldt wrote:
> robert.corbett@sun.com wrote:
>
> (snip regarding quicksort)
>
>
> So if you use the next to the lowest or next to the highest
> of the partition it is then O(n log n)? That is, the worst
> case for median of three.


No, you are missing the point, Glen. As soon as you sample any *fixed*
percentage (i.e., stated in advance and held constant throughout the
procedure) of each sub array and use the median of that as the dividing
point for that subarray the worst case of the Quicksort algorithm
becomes asymptotically nlog(n) albeit with a coefficient that is
inversely related to the fixed percentage. One must keep in kind the
meaning of Big O notation.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com