For Programmers: Free Programming Magazines  


Home > Archive > APL > December 2006 > APL vs. SQL









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author APL vs. SQL
phil chastney

2006-12-11, 6:59 pm

Morten Kromberg wrote:
> Google Groups has made a complete mess of the messages in this
> discussion (perhaps not surprising), so I'm picking a random message
> to respond to...


....when, in fact, you should've oughta started a new thread

> <snip>
>
> P.P.S. I have only experienced a small number of cases where SQL
> databases performed better than "hand-coded" APL solutions, and that
> was usually only after a room full of men in white coats had created
> the right indexes, oiled the disk drives, and you made exactly the
> query which you first thought of. And don't even THINK about asking
> the database server to join your data for you, if you want to keep
> the coffee budget under control ;-)


I can only surmise that your experience encompasses small datasets (?)

you can join data in APL, sure, but the result is constructed somewhere,
be it disk or memory -- the value of SQL's JOIN is that the table thus
defined is not necessarily constructed, because the SQL interpreter
looks ahead at the WHERE clauses and at the column names, and constructs
(if necessary) the smaller table thus defined

OK, some day, when I'm dead and buried, APL will allow a qualifier on
the outer product, to provide an INNER JOIN capability, and the APL
interpreter will also look ahead to the selection criteria, without
applying the join function, and it too will construct the smaller table

and I shall look down(?) and say, "damn, but I wish they'd done that 25
years ago!"

the nearest approximation to this happy state that I know of, is/was
Elsevier's extension to IBM's APL, which allowed you to embed SQL
statements into APL code -- it required AP calls, but we used to embed
SQL statements straight into the APL code, comment them out, and prefix
the commented block with a call to an APL utility which did the
necessary dirty work -- it all looked rather nice in the listings

regards . . . /phil



AA2e72E

2006-12-11, 6:59 pm

Consider the need to produce an ad hoc report, say, an analysis of new
business volumes, and the tools are SAS and APL. SAS will use the data
directly i.e from the corporate new business database whereas APL,
unless it does the same, will require an extract of the same data as a
native file first.

The APL solution will be more expensive because of the overhead of
creating the extract file on an ongoing basis both in terms of time and
costs. The business objective is simply to have the ad hoc reports:
sooner and cheaper is better. What tool will be preferred?

jk

2006-12-11, 6:59 pm


"phil chastney" <phil.hates.spam@amadeus-info.munged.freeserve.co.uk> wrote
in message news:CdIdh.75647$5m3.8513@fe01.news.easynews.com...
> Morten Kromberg wrote:


[...]
>
> the nearest approximation to this happy state that I know of, is/was
> Elsevier's extension to IBM's APL, which allowed you to embed SQL
> statements into APL code -- it required AP calls, but we used to embed SQL
> statements straight into the APL code, comment them out, and prefix the
> commented block with a call to an APL utility which did the necessary
> dirty work -- it all looked rather nice in the listings
>
> regards . . . /phil


yes, that was ADI*PLUS. Elsevier provided a time-sharing service, like
IP-Sharp Toronto; connection was free, you only paid for the use of the
system per millisecond.
Adrian Smith has discussed the subject in detail in his excellent book
A Design Handbook for Commercial Systems, boiling down the fuzz to
the simplist possible concept.
(In the meantime Chris Date has admitted having made many mistakes -
I recently red that somewhere, but forgot the source - please, forget it).

jk







Phil Last

2006-12-11, 6:59 pm

phil chastney wrote:
> OK, some day, when I'm dead and buried, APL will allow a qualifier on ...
> ...
> and I shall look down(?) and say, "damn, but I wish they'd done that 25
> years ago!"

Phil, your surely not intending to leave us THAT early!

jk

2006-12-11, 6:59 pm


"phil chastney" <phil.hates.spam@amadeus-info.munged.freeserve.co.uk> wrote
in message news:CdIdh.75647$5m3.8513@fe01.news.easynews.com...
>

[...]
>
> the nearest approximation to this happy state that I know of, is/was
> Elsevier's extension to IBM's APL, which allowed you to embed SQL
> statements into APL code -- it required AP calls, but we used to embed SQL
> statements straight into the APL code, comment them out, and prefix the
> commented block with a call to an APL utility which did the necessary
> dirty work -- it all looked rather nice in the listings
>
> regards . . . /phil


In Lancaster (1996?) Diana, Jim and Stephen were handing out diskettes
with their COVIRT system to everybody around them. COVIRT is a db
system with entire tables on one component, rather than records.
COVIRT needs less than 100Kb, including WS, manual & helpfile.
Quite a few APL-ers have built domestic db-systems, Bernard Houben made
it a product, using Dyalog + Causeway, and called it IQL - it's still
available,
I believe (see his website).
jk


phil chastney

2006-12-11, 6:59 pm

jk wrote:
>
> <snip>
> (In the meantime Chris Date has admitted having made many mistakes -
> I recently read that somewhere, but forgot the source - please, forget it).


no, no, no -- please, if you can dredge it up, I'd like to see that

Chris Date is always an entertaining speaker, a real tub-thumper, but
--oh dear-- like other fundamentalists, he seems incapable of
acknowledging alternative points of view

there was a good paper in the SIGMOD journal a few years back, where the
author traced the changes in Date's Gospel over the years -- some people
treasure different editions of his Introduction to Database Systems
the way wine snobs treasure different vintages

one topic that irks me no end is Date's fulminations against nulls -- I
am convinced he has got it all wrong

if we consult the word of Codd, as written in "A Relational Model of
Large Shared Data Banks", I think we find the source of the problem --
Codd should never have introduced tuples

but that's another story, for another day and another ng

if you can find that reference, I'd be most grateful
regards . . . /phil
phil chastney

2006-12-11, 6:59 pm

jk wrote:

> In Lancaster (1996?) Diana, Jim and Stephen were handing out diskettes
> with their COVIRT system to everybody around them. COVIRT is a db
> system with entire tables on one component, rather than records.
> COVIRT needs less than 100Kb, including WS, manual & helpfile.
> Quite a few APL-ers have built domestic db-systems, Bernard Houben made
> it a product, using Dyalog + Causeway, and called it IQL - it's still
> available,
> I believe (see his website).


the problem with this approach is that APL handles _data_, so that (with
a few exceptions) each intermediate result is made manifest (in fact,
the standard requires it)

a half-decent SQL interpreter, OTOH, manipulates addresses -- the
structure of the SQL statement requires the selection criteria to be
included in the SELECT statement (and not applied later, as in APL), so
that the dereferencing of these addresses can be left until later --
this is essential, if we need to handle large data volumes

there are 2 ways aroung this:

one: take the data manipulation language out of APL i.e, enclose it with
quotes, and then compile APL from the string

or two: take over the handling of addresses in APL -- I gave a workshop
in Lancaster on doing just that -- it worked well, but I would not
propose it as a general method, because it exposes too much low-level
information to people like developers and users, who should really be
spending their time thinking about the application

all the best . . . /phil
phil chastney

2006-12-11, 6:59 pm

Phil Last wrote:

> phil chastney wrote:
>
>
> Phil, your surely not intending to leave us THAT early!
>

I have no illusions, Phil -- I know they're out to get me

they're sitting there on the code, waiting till I pop my clogs, before
releasing it to the rest of the world

well, here come those nice young men in their clean white coats -- must
be time for my medication . . . /phil
jk

2006-12-11, 6:59 pm


"phil chastney" <phil.hates.spam@amadeus-info.munged.freeserve.co.uk> wrote
in message news:DPSdh.81976$5m3.70604@fe01.news.easynews.com...
> jk wrote:
>
>
> the problem with this approach is that APL handles _data_, so that (with a
> few exceptions) each intermediate result is made manifest (in fact, the
> standard requires it)
>
> a half-decent SQL interpreter, OTOH, manipulates addresses -- the
> structure of the SQL statement requires the selection criteria to be
> included in the SELECT statement (and not applied later, as in APL), so
> that the dereferencing of these addresses can be left until later --
> this is essential, if we need to handle large data volumes
>
> there are 2 ways aroung this:
>
> one: take the data manipulation language out of APL i.e, enclose it with
> quotes, and then compile APL from the string
>
> or two: take over the handling of addresses in APL -- I gave a workshop in
> Lancaster on doing just that -- it worked well, but I would not propose it
> as a general method, because it exposes too much low-level information to
> people like developers and users, who should really be spending their time
> thinking about the application
>
> all the best . . . /phil


yes, and that's a 2nd reason why i love K(db); any type of somewhat
structured data is mash for Kdb, Excel files, Access-files, comma or
HT separated vectors &c. I once published an article in a Dutch
DB-Magazine typifying Kdb as an "insatiable omnivore", and it really is.
(I'll do my best to trace the source about Date)
jk



Morten Kromberg

2006-12-11, 6:59 pm

phil chastney skrev:
> ...when, in fact, you should've oughta started a new thread


You are right... Or perhaps a number of new threads...

> I can only surmise that your experience encompasses small datasets (?)


Not necessarily... Especially when datasets get large and the joins are
complex, you often see people selecting data from tables on some good
keys and doing the additional filtering and joining locally. This
comment is perhaps not quite fair, because often the problem is simply
that the server is overloaded - it becomes a huge bottleneck. But it is
often a real problem.

Obviuosly, if the data really IS humungous - or complex, using SQL will
be the easiest way to code whatever manipulation you need. That doesn't
mean it will be fast. The other good reason for putting stuff in an SQL
database is that is a good way to share data because all apps know how
to access it. But SQL because it has the best performance for a given
problem? As I said, almost never!

> you can join data in APL, sure, but the result is constructed somewhere,
> be it disk or memory -- the value of SQL's JOIN is that the table thus
> defined is not necessarily constructed, because the SQL interpreter
> looks ahead at the WHERE clauses and at the column names, and constructs
> (if necessary) the smaller table thus defined


.... when the optimisation works and the server is not too busy.

OK, I'm overdoing this a bit - there are lots of situations where an
SQL database is a good solution. But generally an easier, faster
solution than other alternatives available to APL users? No way... You
are nearly always better off making a suitable extract to APL component
files and working on that. You're going to have to make an extract
anyway, because the sym isn't going to let you do your queries on
the production tables :-)

AA2e72E

2006-12-11, 6:59 pm


But:
1. Whatever the bottlenecks with the database server, the effect is
identical on all applications using it, be they written in C#, C++ or
APL. In other words, the problem is external.
2. No you would not be allowed to access productions tables; however
this does not mean that you have to have extracts. Usually, you would
be able to use Views (read only versions of the tables).

sa

2006-12-11, 6:59 pm

as you may already know, kdb offers a different solution to the
problem of how to rationalize the relationship between general
programming language and database manipulation language: the
latter (either SQL or KSQL) is *part* of the former. that is,
expressions in the DBL are confections of more general expressions
in the PL. "relational tables" are first-class denizens of the
datatype universe.

see the wealth of materials here:

http://www.kx.com/developers/documentation.php

and the white paper on kdb here:

http://www.kx.com/papers/kxdb-request.php

(registration for the latter required)

contrary to assertions made elsewhere, kdb supports primitive
date and time datatypes, and scales efficiently to very large
datasets.

but APLers (at least those with a taste for theory) should pay
more attention to the way kdb reduces the objects of the database
world to first- and higher-order "maps" of index-domains and
value-ranges.

note that kdb+ originated as an *application* written in k.
we're familiar with similar past efforts along these lines --
APL\DI, for example. and i believe that randy macdonald has
been using a home-grown version of this sort of thing for years.
but kdb+ is *not* one of these. it is a general programming
language in which arrays and relational tables are both first-
class citizens, and where primitives like + and * are orthogonal
over both datatypes. e.g. tables and arrays can be added and
multiplied, transformed in various ways, &c.

i think i've got most of this right, although i have no doubt
someone like mike rosenberg or ed bierly, who are both actively
involved in kdb+ projects, could do a better job.


"Morten Kromberg" <mkrom@dyalog.com> wrote in message news:1165494864.856286.178450@80g2000cwy.googlegroups.com...
> phil chastney skrev:
>
> You are right... Or perhaps a number of new threads...
>
>
> Not necessarily... Especially when datasets get large and the joins are
> complex, you often see people selecting data from tables on some good
> keys and doing the additional filtering and joining locally. This
> comment is perhaps not quite fair, because often the problem is simply
> that the server is overloaded - it becomes a huge bottleneck. But it is
> often a real problem.
>
> Obviuosly, if the data really IS humungous - or complex, using SQL will
> be the easiest way to code whatever manipulation you need. That doesn't
> mean it will be fast. The other good reason for putting stuff in an SQL
> database is that is a good way to share data because all apps know how
> to access it. But SQL because it has the best performance for a given
> problem? As I said, almost never!
>
>
> ... when the optimisation works and the server is not too busy.
>
> OK, I'm overdoing this a bit - there are lots of situations where an
> SQL database is a good solution. But generally an easier, faster
> solution than other alternatives available to APL users? No way... You
> are nearly always better off making a suitable extract to APL component
> files and working on that. You're going to have to make an extract
> anyway, because the sym isn't going to let you do your queries on
> the production tables :-)
>



phil chastney

2006-12-11, 6:59 pm

Morten Kromberg wrote:
>
> <snip>
>
> OK, I'm overdoing this a bit - there are lots of situations where an
> SQL database is a good solution. But generally an easier, faster
> solution than other alternatives available to APL users? No way... You
> are nearly always better off making a suitable extract to APL component
> files and working on that. You're going to have to make an extract
> anyway, because the sym isn't going to let you do your queries on
> the production tables :-)


I'm not quite sure what your position is here -- at one point, I thought
you wanted to do the data administration in APL, but here you seem to
accept the idea that data admin is done in SQL, while APL is used for
processing on "a suitable extract"

as a division of labour, then SQL for admin and APL for processing seems
like a good idea -- I've used that combination more than once

mind you, I'm less happy with that idea of "a suitable extract"

I've seen this sort of thing -- they'd come in, access the corporate db,
lock out 1600 rows, download these rows to their PCs, process the data
in Excel using VB -- roundabout 4pm, they'd upload their edits, unlock
the 1600 rows, and go home

total bloody nightmare: security, integrity, currency -- all compromised

I blame that 70s utopian dream of a single corporate database -- but
that's another matter, and hardly central to this ng

I wonder where you gained your unhappy experience of SQL? -- database
products (Oracle, SQL Server, ODBC) were more hype than substance for a
long while -- it's not easy to forgive their vendors for all the grief
their oversold products have caused, but times have changed -- SQL can
be quite maddening at times, but I see little point in duplicating in
APL work that has already been by a database vendor

you may or may not agree -- like I said, I'm not sure what your position
is here . . . /phil
Paul Mansour

2006-12-11, 6:59 pm


phil chastney wrote:
> Morten Kromberg wrote:
>
> I can only surmise that your experience encompasses small datasets (?)
>
> you can join data in APL, sure, but the result is constructed somewhere,
> be it disk or memory -- the value of SQL's JOIN is that the table thus
> defined is not necessarily constructed, because the SQL interpreter
> looks ahead at the WHERE clauses and at the column names, and constructs
> (if necessary) the smaller table thus defined


I think the key term there was "hand-coded". I don't think Morten is
talking about doing outer products on large tables willy nilly. Doing
joins on large data sets will invlove manipulating indices and only
actually assembling what you need. Now it would certainly be nice if
the interpreter handled relations as first class objects ala kdb+, but
since APL is a general programming language (while SQL is an
implementation of the relational algebra) you can do the optimisation
yourself -- and with classes in Version 11 of Dyalog, it's possible to
implement a object model of tables and columns, and get essentially
first class tables.

admin9974@hotmail.com

2006-12-11, 6:59 pm

As most of the regulars here are surely aware, there were relational
databases built
in APL long before SQL was a commercially viable entity. RDS and other
systems comes to mind.

At this point, SQL has long been the "lingua franca" of database
management everywhere, like it or not. To my mind, it is one of the
better standards to have emerged. But still, how do you debug a really
long and convoluted SQL statement, subqueries and whatnot?

One simple idea I have kept from the early APL systems, and which have
been of considerable use even in SQL enviroments, is the idea of a "hit
bit column" (there may be a better term).
The idea being that you create an extra boolean column, and then push
0:s and 1:s into it, one "WHERE xxx" step at a time. You can then
follow the development of the result set at each
intermediate step. Very handy and debug-friendly.

Of course, there may be lots of situations where this is not a good
idea. But
in "offline" reporting systems I=B4ve seen it work like a charm.

phil chastney

2006-12-11, 6:59 pm

admin9974@hotmail.com wrote:
> <snip>
>
> At this point, SQL has long been the "lingua franca" of database
> management everywhere, like it or not. To my mind, it is one of the
> better standards to have emerged. But still, how do you debug a really
> long and convoluted SQL statement, subqueries and whatnot?


it's a nightmare, no question on that score

in fact, even a seemingly simple outer join can cause severe headaches

regards . . . /phil
phil chastney

2006-12-11, 6:59 pm

Paul Mansour wrote:

> <snip>
> I think the key term there was "hand-coded". I don't think Morten is
> talking about doing outer products on large tables willy nilly. Doing
> joins on large data sets will invlove manipulating indices and only
> actually assembling what you need. Now it would certainly be nice if
> the interpreter handled relations as first class objects ala kdb+, but
> since APL is a general programming language (while SQL is an
> implementation of the relational algebra) you can do the optimisation
> yourself -- and with classes in Version 11 of Dyalog, it's possible to
> implement a object model of tables and columns, and get essentially
> first class tables.


I take your point on hand-coding -- personally, I belong to the "how can
I cobble together an acceptable solution from existing resources,
without having to do too much work" school of thought

Morten's argument seemed to permit hand-coded APL, but not hand-coded
SQL, which seems to me like cheating -- but that's OK, because the terms
of the contest were never specified anyway

I also take your point on first class tables -- if we consider a
continumm of solutions from all-APL to all-SQL, with the middle bit a
combination of APL and SQL, then first class tables make the all-APL
band a bit wider

there will remain, however, problems where sharing, security, indexing,
record-locking, authorisation, backup, etc, etc are important, and it
will be some time before standard APL objects can compete in those areas

if APL continues to improve in data handling, and/or SQL improves in
data processing, then I shall be a bit happier -- OK, no one language
can cover everything, but I see no reason why APL shouldn't more-or-less
include SQL as a subset

I'm interested in your observation that "SQL is an implementation of the
relational algebra" -- you might mention this to Chris Date, who seems
to think otherwise -- there's an exercise question in one of his books,
"is SQL a relational algebra or a relational calculus?" -- it took me
some time to figure that one out

all the best . . . /phil

P.S: on reflection, I see no reason why APL shouldn't include relational
algebra -- but it would need more than a DML to make it a proper
database system
Paul Mansour

2006-12-11, 6:59 pm

phil chastney wrote:

> I'm interested in your observation that "SQL is an implementation of the
> relational algebra" -- you might mention this to Chris Date, who seems
> to think otherwise -- there's an exercise question in one of his books,
> "is SQL a relational algebra or a relational calculus?" -- it took me
> some time to figure that one out


Yes, I almost wrote "imperfect implementation..." I agree
whole-heartedly with Chis Date on this (as well as with his view of
nulls, which I know you disagree with - I must say that after going
thru the process of implementing a RDMS in APL, and considering the
pros and cons of nulls, both in implementation and from the users
perspective, it was real pleasure to simply ban them! No sense getting
into an argument on this - we can just agree to disagree ;) )

> P.S: on reflection, I see no reason why APL shouldn't include relational
> algebra -- but it would need more than a DML to make it a proper
> database system


I agree. As you noted above, APL and the relational model go together
very well (in fact, the APL interpreter (at leasts the dyalog one)
implements multiple assignment in exactly the same way as the RM as
specified by Date/Darwen).

Morten Kromberg

2006-12-11, 6:59 pm

phil chastney skrev:
> Morten's argument seemed to permit hand-coded APL, but not hand-coded
> SQL, which seems to me like cheating -- but that's OK, because the terms
> of the contest were never specified anyway


If so, it is only because my imagination fails me: What would you
consider to be "hand-coded SQL"? Stored procedures?

But you are right, my whole point is that havinging the data in APL
allows you to use your knowledge and cheat every which way you can. You
can take advantage of application-specific knowledge about the data,
which an SQL database CANNOT HAVE. As far as I know, the only
application-specific thing you can tell it is how to build indexes(?).

Hand coding often (but not always!!!) means more work for the
programmer, the point is that with APL this is bearable and the results
can still be clean and maintainable (and orders of magnitude faster).

I think extending APL in the direction of having better support for
queries is VERY interesting... Arrays of objects already allow some
very query-like expressions...

((Sales.Quantity>10000)/Sales).Amount x <- 0.9 // x is
multiplication

.... Is a valid expression in Dyalog v11, on an array of instances of
something with fields or properties called Quantity and Amount. The
double appearance of Sales makes one think "something should be done".
Do we need to extend "dot" so we could write:

(Sales./Quantity>1000).Amount x <- 0.9

.... etc ... kdb has some interesting ideas, too - some of which might
be worth adding to APL (although we'll never catch Arthur for raw
speed). I think we have ideas for another decade or two of language
evolution ...

phil chastney

2006-12-11, 6:59 pm

Paul Mansour wrote:
> phil chastney wrote:
>
>
>
> I agree. As you noted above, APL and the relational model go together
> very well (in fact, the APL interpreter (at leasts the dyalog one)
> implements multiple assignment in exactly the same way as the RM as
> specified by Date/Darwen).


thanks for your reply, Paul, and thanks especially for the tone of sweet
reason in which you replied

I was wondering why my last msg looked odd, and it was only after
sending it that I realised that some comments applied to SQL (as DML),
and some to the database infrastructure as a whole, without
distinguishing between them -- naughty

all the best . . . /phil



phil chastney

2006-12-11, 6:59 pm

Morten Kromberg wrote:

> phil chastney skrev:
>
>
>
> If so, it is only because my imagination fails me: What would you
> consider to be "hand-coded SQL"? Stored procedures?


stored procedures apart, you can tinker with what data you store
(provided you're not soem sort of purist)

one is normalisation - 3NF is a fine and lovely thing, but I wouldn't
let it get in the way fo crucial retrieval speeds -- other stuff like
5NF and 6NF can come out excessively expensive in the time spent joining
tables

another is derived data -- I prefer to store raw data only, but comments
as above (plus the fact that check totals sometimes don't check, so then
I'll store bothe the total and its breakdown)

I would include pre-joined tables under the heading of "derived" data

then there's the stuff nobody admits to: denormalisation, i.e, splitting
a table horizontally or vertically, but since I've never done that sort
of thing, I don't know much about it

then, of course, there's indexing and clustering

none of this is SQL tuning, so my refernce to "hand-coded SQL" was
misleading, but it is database tuning, which can help SQL _appear_ faster

> I think extending APL in the direction of having better support for
> queries is VERY interesting... Arrays of objects already allow some
> very query-like expressions...
>
> ((Sales.Quantity>10000)/Sales).Amount x <- 0.9 // x is
> multiplication
>
> ... Is a valid expression in Dyalog v11, on an array of instances of
> something with fields or properties called Quantity and Amount. The
> double appearance of Sales makes one think "something should be done".
> Do we need to extend "dot" so we could write:
>
> (Sales./Quantity>1000).Amount x <- 0.9
>
> ... etc ... kdb has some interesting ideas, too - some of which might
> be worth adding to APL (although we'll never catch Arthur for raw
> speed). I think we have ideas for another decade or two of language
> evolution ...
>

hells bells, Morten -- do I have to wait till I'm 110 before I can get
my hands on this stuff? . . . /phil
Morten Kromberg

2006-12-11, 6:59 pm

phil chastney skrev:

> hells bells, Morten -- do I have to wait till I'm 110 before I can get
> my hands on this stuff?


We've doubled the size of the developer team in the last 12 months...
But these things always take a frustrating amount of time to sort out.
It's too early to say exactly where we are going or when we will get
there. All I can say at this point is that the pace should increase.

phil chastney

2006-12-11, 6:59 pm

Morten Kromberg wrote:
> phil chastney skrev:
>
>
>
>
> We've doubled the size of the developer team in the last 12 months...
> But these things always take a frustrating amount of time to sort out.
> It's too early to say exactly where we are going or when we will get
> there. All I can say at this point is that the pace should increase.
>

point taken -- please forgive my impatience . . . /phil
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com