Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

How does the SORT statement work?
I know there was a thread posted here earlier about "SORT", but it seems
to be referring to "SORT cards", and I think that's something different than
the "SORT statement" I'm trying to learn more about. Here's the syntax of
the SORT statement:

SORT filename-1
ON ASCENDING KEY data-name-1
ON DESCENDING KEY data-name-2
..

WITH DUPLICATES IN ORDER
COLLATING SEQUENCE IS alphabet-name

INPUT PROCEDURE IS procedure-name-1a THROUGH procedure-name-1z
USING filename-2a, filename-2b, ...

OUTPUT PROCEDURE IS procedure-name-2a THROUGH procedure-name-2z
GIVING filename-3a, filename-3b, ...

where the ... isn't literally part of the syntax, but means that the
previous terms can be repeated arbitrarily many times.

So first of all, what does this statement do? Do you provide it with an
unsorted file (filename-1) and it sorts it? Why does the INPUT PROCEDURE
clause accept multiple filenames, and how do each of these files affect the
process, and similarly for the OUTPUT PROCEDURE?

I'm guessing it's something like the INPUT PROCEDUREs are called to load
the records into memory, and the SORT statement does some comparison based
on the COLLATING SEQUENCE, and then uses the OUTPUT PROCEDUREs to save the
records somewhere; traditionally, the input and output comes from files, but
this is not nescessarily so. Am I still on track?

Is the reading of records into memory done 1 record at a time, or does
it load multiple records? If it's one at a time, wouldn't you need at least
2 records in memory to do a meaningful comparison to decide on a sort order?
If it's multiple records, wouldn't you need to specify enough memory in your
DATA DIVISION to store those multiple records?

Any idea what sorting algorithm is used? (E.g. mergesort, bublesort,
quicksort, insertion sort, etc.), or is that implementation defined?

- Oliver


Report this thread to moderator Post Follow-up to this message
Old Post
Oliver Wong
03-07-06 11:56 PM


Re: How does the SORT statement work?
"Internal SORT - class 101" <G>

pre-02 Standard:

1) The SORT statement takes one OR more input files in any order (random or
already sorted) and creates a single output file (or multiple copies of the 
same
file) in the order determined by the set of SORT keys specified in the SORT
statement.  The position of those keys is determined by the record layout un
der
the SD which is a VIRTUAL file for records "in transit" from input to output
.

2) If NO input or output procedure is specified, then all the I/O - OPEN, RE
AD,
SORT, WRITE, CLOSE is done "automagically" - using the sort specification of
 the
SORT statement's ascending/descending KEYS.

3) If an Input procedure is specified, then the "initial-type" output, e.g. 
OPEN
and READs of input files must be "manually" coded by the programmer.  This
allows for getting records from "odd" places like SQL, IMS (or other databas
e)
or for using editing logic (such as combining multiple input records from
multiple files into a single "logical" file).  (See the RELEASE statement)

4) If an output procedure is specified, then the "logical" records are retur
ned
to the program record by record in "sorted" order - for post-sort processing
.
Some programs (for example) just want to create a report (for example using
report writer GENERATE statement in an Output procedure) and never actually
write out the sorted record.  Others want to eliminate certain records from 
the
output, but need to see them in sorted order to determine which ones to
eliminate. (See RETURN statement)

5) A single SORT statement may have an Input procedure, an Output procedure,
neither, or both.

6) The procedure mentioned in an Input/Output procedure is only executed ONC
E.
Therefore, the program logic needs to handle the "loop" thru all the records
 (or
as many as they want to process).  In the '74 Standard (and before) these ne
eded
to be SECTIONS and couldn't get "out" of the section.  This led to the
REQUIREMENT for using a GO TO statement in order to "loop".  The SECTION
restriction went away with the '85 Standard, so now Input/Output procedures
often (usually) include a PERFORM UNTIL statement.

7) For most COBOL compilers that I know of, there are also "special register
s".
The most common one that I know about is the SORT-RETURN (which is common fo
r
vendors with a RETURN-CODE special register).  This shows "0" when the SORT 
is
successful and "16" when it isn't.  It can also be SET to "16" in the middle
 of
an input/output procedure to terminate (with error indication) an internal s
ort.

8) The '02 Standard (and some vendors as an extension to the '85 Standard)
include a "Table Sort" which allows for the sorting of elements in a group (
data
division) structure that has an OCCURS in or under it.

***

Does this get you started?

***

P.S.  For IBM mainframe customers, there are two "separate" topics that ofte
n
come up when talking about SORTs and COBOL.

1) When to use an Internal COBOL SORT versus when to use an external (utilit
y)
SORT.

2) What is the difference between a COBOL internal SORT with Input/Output
procedures and using an "E15" (input) or "E35" SORT exit written in COBOL.

I won't go into either of these here, but thought I should mention them any 
way.
When/If you are working with SORTs in an IBM mainframe environment, there ar
e
LOTS of things to think about and learn.  (Restart, tuning, etc)

***

P.P.S.  The SORT statement is often talked about in relationship to the MERG
E
statement which is BASICALLY the same thing, but with "guaranteed to already
 be
sorted" input.

--
Bill Klein
wmklein <at> ix.netcom.com
"Oliver Wong" <owong@castortech.com> wrote in message
news:eAlPf.12509$Cp4.11354@edtnps90...
>    I know there was a thread posted here earlier about "SORT", but it seem
s to
> be referring to "SORT cards", and I think that's something different than 
the
> "SORT statement" I'm trying to learn more about. Here's the syntax of the 
SORT
> statement:
>
> SORT filename-1
>  ON ASCENDING KEY data-name-1
>  ON DESCENDING KEY data-name-2
>  ...
>
>  WITH DUPLICATES IN ORDER
>  COLLATING SEQUENCE IS alphabet-name
>
>  INPUT PROCEDURE IS procedure-name-1a THROUGH procedure-name-1z
>    USING filename-2a, filename-2b, ...
>
>  OUTPUT PROCEDURE IS procedure-name-2a THROUGH procedure-name-2z
>    GIVING filename-3a, filename-3b, ...
>
> where the ... isn't literally part of the syntax, but means that the previ
ous
> terms can be repeated arbitrarily many times.
>
>    So first of all, what does this statement do? Do you provide it with an
> unsorted file (filename-1) and it sorts it? Why does the INPUT PROCEDURE
> clause accept multiple filenames, and how do each of these files affect th
e
> process, and similarly for the OUTPUT PROCEDURE?
>
>    I'm guessing it's something like the INPUT PROCEDUREs are called to loa
d
> the records into memory, and the SORT statement does some comparison based
 on
> the COLLATING SEQUENCE, and then uses the OUTPUT PROCEDUREs to save the
> records somewhere; traditionally, the input and output comes from files, b
ut
> this is not nescessarily so. Am I still on track?
>
>    Is the reading of records into memory done 1 record at a time, or does 
it
> load multiple records? If it's one at a time, wouldn't you need at least 2
> records in memory to do a meaningful comparison to decide on a sort order?
 If
> it's multiple records, wouldn't you need to specify enough memory in your 
DATA
> DIVISION to store those multiple records?
>
>    Any idea what sorting algorithm is used? (E.g. mergesort, bublesort,
> quicksort, insertion sort, etc.), or is that implementation defined?
>
>    - Oliver



Report this thread to moderator Post Follow-up to this message
Old Post
William M. Klein
03-07-06 11:56 PM


Re: How does the SORT statement work?
What level of detail do you want?  Do you have a COBOL reference manual
handy?  I think that might help!   Even the draft of the 2008 standard might
prove useful here.

What may be confusing you is that INPUT PROCEDURE and USING are *mutually
exclusive* in the same SORT statement, as are OUTPUT PROCEDURE and GIVING.
You can have SORT ... USING ... GIVING, you can have SORT ... INPUT
PROCEDURE ... GIVING, you can have SORT ... USING ... OUTPUT PROCEDURE, and
you can have SORT ... INPUT PROCEDURE ... OUTPUT PROCEDURE.  But you *can't*
have INPUT PROCEDURE  ... USING or OUTPUT PROCEDURE ... GIVING.  Makes no
sense.  The syntax diagram should show curly braces around INPUT PROCEDURE
/ USING and around OUTPUT PROCEDURE / GIVING.   If it doesn't, complain to
your vendor.

When an INPUT PROCEDURE is used, it is called ONCE by the SORT mechanism,
and is expected to execute a RELEASE statement for each record it wishes to
pass to the SORT mechanism.  When an OUTPUT PROCEDURE is used, it is called
ONCE by the SORT mechanism (ostensibly) after the sorting is done, and is
expected to execute a RETURN statement for each record it wishes to retrieve
from the SORT mechanism into the program.  In this latter case, the SORT
statement terminates when the OUTPUT PROCEDURE terminus is reached.

When a USING clause is specified, the records in the specified files are
passed to the sort mechanism, and when the last record of the last file has
been passed to the sort mechanism, the sorting process begins.  When a
GIVING clause is specified, the sorted records are written into the
designated file, and when the sorted records are exhausted, the SORT
statement terminates.

The order in which the records are sorted is FIRST by data-name-1 in
ASCENDING order, and then for records that have duplicate values in
data-name-1, they are sorted in DESCENDING order by data-name-2.  If
DUPLICATES IN ORDER is specified, the order of those records that are
duplicate on *all* keys is retained in the output; if it isn't, the order in
which those duplicates appear isn't defined.  That's true whether the
records are passed to the sort via RELEASE or are contained in a USING file.

COLLATING SEQUENCE specifies the collating sequence by which the
alphanumeric comparisons necessary to accomplish the sort are performed.

>    I'm guessing it's something like the INPUT PROCEDUREs are called to
load
> the records into memory, and the SORT statement does some comparison based
> on the COLLATING SEQUENCE, and then uses the OUTPUT PROCEDUREs to save the
> records somewhere; traditionally, the input and output comes from files,
> but this is not nescessarily so. Am I still on track?

Each time a RELEASE statement in an INPUT PROCEDURE is executed, a record is
passed to the SORT.  Each time a RETURN statement in an OUTPUT PROCEDURE is
executed, a record is passed back to the program.  It's expected that the
INPUT PROCEDURE pass all the records to the sort before reaching its
terminus, likewise, it is expected that the OUTPUT PROCEDURE retrieve all
the records from the SORT before reching its terminus.

>    Is the reading of records into memory done 1 record at a time, or does
> it load multiple records? If it's one at a time, wouldn't you need at
> least 2 records in memory to do a meaningful comparison to decide on a
> sort order? If it's multiple records, wouldn't you need to specify enough
> memory in your DATA DIVISION to store those multiple records?

Not sure what you mean by this.  For INPUT PROCEDURE sorts, records are
passed to the Great Sorter in the Sky one at a time through RELEASE
statements; for USING sorts, they're read by the system in whatever gulps
the implementor decides are appropriate.   How many records actually end up
in memory at any given time is an implementor issue.

>    Any idea what sorting algorithm is used? (E.g. mergesort, bublesort,
> quicksort, insertion sort, etc.), or is that implementation defined?

It's implementor defined, so long as the results are as specified in the
language standard.  The *language* doesn't care how the sort is done -- in
theory, from a language standpoint it could be done by putting the records
out on some sort of unit-record device and having a bunch of Discalced
Cistercians lay them out on their refectory table in order before reading
them back in again.

It'd probably be easiest if you started with a simple
SORT SDNAME-1 ASCENDING KEY KEY-1
USING INPUTFILE GIVING OUTPUTFILE.
and work up in complexity from there.

-Chuck Stevens



Report this thread to moderator Post Follow-up to this message
Old Post
Chuck Stevens
03-07-06 11:56 PM


Re: How does the SORT statement work?
Oliver Wong wrote:

>   INPUT PROCEDURE IS procedure-name-1a THROUGH procedure-name-1z
>     USING filename-2a, filename-2b, ...
>
>   OUTPUT PROCEDURE IS procedure-name-2a THROUGH procedure-name-2z
>     GIVING filename-3a, filename-3b, ...
>
> where the ... isn't literally part of the syntax, but means that the
> previous terms can be repeated arbitrarily many times.

The syntax signals that you failed to notice are the ones that specify
that 'INPUT PROCEDURE' and 'USING' are alternates - use one or the
other, not both.  Same for 'OUTPUT PROCEDURE' and 'GIVING'.

The statement will sort records either from a file(s) or from the
program creating and RELEASEing records and will either output to a
file(s) or allow a procedure to RETURN these in order.


Report this thread to moderator Post Follow-up to this message
Old Post
Richard
03-07-06 11:56 PM


Re: How does the SORT statement work?
Thank you,

I have a much better understanding of how the statements works now. I
was under the incorrect impression that the INPUT PROCEDURE was called
multiple times, once for each record that was to be retrieved, and similarly
for the OUPUT PROCEDURE. And yes, I thought the USING clause modified the
INPUT PROCEDURE, rather that was an exclusive alternative to the INPUT
PROCEDURE (and similarly for the OUTPUT PROCEDURE).

Just to be sure I understand the sequence of steps:

(*) The INPUT PROCEDURE (assuming one is present) gets called once.
(*) Within this procedure, the RELEASE statement will get executed multiple
times, once for every record that is to participate in the sorting. These
records go to some magical place where the actual sorting will occur.
(*) The OUTPUT PROCEDURE is them called once.
(*) Within this procedure, the RETURN statement will get executed multiple
times. Let's say it appears as "RETURN filename-1 INTO record-1". After each
execution of the RETURN statement, the contents of record-1 will change,
representing each of the sorted records, and the records will show up in
sorted order.
(*) Control then returns to the next statement after the SORT statement that
was just executed.

- Oliver


Report this thread to moderator Post Follow-up to this message
Old Post
Oliver Wong
03-07-06 11:56 PM


Re: How does the SORT statement work?
Pretty much.  If you encounter AT END on a RETURN it's a good idea to end up
at the OUTPUT PROCEDURE terminus without executing another RETURN, otherwise
you'll end up with the fatal EC-SORT-MERGE-RETURN exception.   There's no
onus against reaching the OUTPUT PROCEDURE terminus *without* having
RETURNed all the records, though, as far as I know.

The expected behavior of SORT -- the basic phases and the relationship
between input and output -- is pretty well detailed in the standard, and
should be similarly specified in the reference materials for individual
implementations as well.

-Chuck Stevens

"Oliver Wong" <owong@castortech.com> wrote in message
news:kKmPf.12530$Cp4.1139@edtnps90...
>    Thank you,
>
>    I have a much better understanding of how the statements works now. I
> was under the incorrect impression that the INPUT PROCEDURE was called
> multiple times, once for each record that was to be retrieved, and
> similarly for the OUPUT PROCEDURE. And yes, I thought the USING clause
> modified the INPUT PROCEDURE, rather that was an exclusive alternative to
> the INPUT PROCEDURE (and similarly for the OUTPUT PROCEDURE).
>
>    Just to be sure I understand the sequence of steps:
>
> (*) The INPUT PROCEDURE (assuming one is present) gets called once.
> (*) Within this procedure, the RELEASE statement will get executed
> multiple times, once for every record that is to participate in the
> sorting. These records go to some magical place where the actual sorting
> will occur.
> (*) The OUTPUT PROCEDURE is them called once.
> (*) Within this procedure, the RETURN statement will get executed multiple
> times. Let's say it appears as "RETURN filename-1 INTO record-1". After
> each execution of the RETURN statement, the contents of record-1 will
> change, representing each of the sorted records, and the records will show
> up in sorted order.
> (*) Control then returns to the next statement after the SORT statement
> that was just executed.
>
>    - Oliver



Report this thread to moderator Post Follow-up to this message
Old Post
Chuck Stevens
03-07-06 11:56 PM


Re: How does the SORT statement work?
On Tue, 7 Mar 2006 13:36:32 -0800, "Chuck Stevens"
<charles.stevens@unisys.com> wrote:

>Pretty much.  If you encounter AT END on a RETURN it's a good idea to end u
p
>at the OUTPUT PROCEDURE terminus without executing another RETURN, otherwis
e
>you'll end up with the fatal EC-SORT-MERGE-RETURN exception.   There's no
>onus against reaching the OUTPUT PROCEDURE terminus *without* having
>RETURNed all the records, though, as far as I know.

One thing that I've seen beginners do - drop through from the sort
command to the input or output procedure when done.    Make sure your
sort parts are completely separate.

Report this thread to moderator Post Follow-up to this message
Old Post
Howard Brazee
03-07-06 11:56 PM


Re: How does the SORT statement work?
Oliver Wong wrote:
>
>    Any idea what sorting algorithm is used? (E.g. mergesort,
> bublesort, quicksort, insertion sort, etc.), or is that
> implementation defined?

The algorithm used is determined by the writer of the SORT code - it may or
may not be the same people who wrote the compiler. Further, the algorithm is
(a trade) secret. That's why they get paid the big bucks: for a fast sort
program.



Report this thread to moderator Post Follow-up to this message
Old Post
HeyBub
03-07-06 11:56 PM


Re: How does the SORT statement work?
In the case of Unisys MCP, the actual ordering of the records is performed
by the operating system.   The code generated by the '74 compiler may
generate code to call this routine directly (and for USING/GIVING, it may
generate INPUT or OUTPUT procedures behind the user's back or it may
actually pass the files), or it may generate code to call an entry point in
a library that performs some execution-time optimization shenanigans and
then calls the sort routine (again, using files or doing the equivalent of
INPUT and OUTPUT procedures).  The code generated by the '85 compiler uses
the library exclusively.   In either case, how the MCP accomplishes the
reordering of the records is None Of The Compiler's Business, so long as the
records end up in the specified order.

-Chuck Stevens

"HeyBub" <heybubNOSPAM@gmail.com> wrote in message
news:120s117hk1juj97@news.supernews.com...
> Oliver Wong wrote: 
>
> The algorithm used is determined by the writer of the SORT code - it may
> or may not be the same people who wrote the compiler. Further, the
> algorithm is (a trade) secret. That's why they get paid the big bucks: for
> a fast sort program.
>



Report this thread to moderator Post Follow-up to this message
Old Post
Chuck Stevens
03-07-06 11:56 PM


Re: How does the SORT statement work?
Oliver Wong wrote:
>    I know there was a thread posted here earlier about "SORT", but it
> seems to be referring to "SORT cards", and I think that's something
> different than the "SORT statement" I'm trying to learn more about.
> Here's the syntax of the SORT statement:

Oliver,

Told you, I don't do SORTs. but as it's you, the brief reference by
Jerome Garfunkel to SORTs in his 'COBOL '85 Example Book'. Hope it is of
some help :-

Sort Sort-Work-File
on ascending key work-order-number
with duplicates in order
input procedure is WORK-ORDER-VALIDATION-PROCESS
giving		Daily-Work-Orders-Seq
Daily-Work-Orders-Relative
Daily-Work-Orders-Indexed

Jimmy

Report this thread to moderator Post Follow-up to this message
Old Post
James J. Gavan
03-07-06 11:56 PM


Sponsored Links




Last Thread Next Thread Next
Pages (2): [1] 2 »
Search this forum -> 
Post New Thread

Cobol archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 12:45 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.