Code Comments
Programming Forum and web based access to our favorite programming groups.I know there was a thread posted here earlier about "SORT", but it seems to be referring to "SORT cards", and I think that's something different than the "SORT statement" I'm trying to learn more about. Here's the syntax of the SORT statement: SORT filename-1 ON ASCENDING KEY data-name-1 ON DESCENDING KEY data-name-2 .. WITH DUPLICATES IN ORDER COLLATING SEQUENCE IS alphabet-name INPUT PROCEDURE IS procedure-name-1a THROUGH procedure-name-1z USING filename-2a, filename-2b, ... OUTPUT PROCEDURE IS procedure-name-2a THROUGH procedure-name-2z GIVING filename-3a, filename-3b, ... where the ... isn't literally part of the syntax, but means that the previous terms can be repeated arbitrarily many times. So first of all, what does this statement do? Do you provide it with an unsorted file (filename-1) and it sorts it? Why does the INPUT PROCEDURE clause accept multiple filenames, and how do each of these files affect the process, and similarly for the OUTPUT PROCEDURE? I'm guessing it's something like the INPUT PROCEDUREs are called to load the records into memory, and the SORT statement does some comparison based on the COLLATING SEQUENCE, and then uses the OUTPUT PROCEDUREs to save the records somewhere; traditionally, the input and output comes from files, but this is not nescessarily so. Am I still on track? Is the reading of records into memory done 1 record at a time, or does it load multiple records? If it's one at a time, wouldn't you need at least 2 records in memory to do a meaningful comparison to decide on a sort order? If it's multiple records, wouldn't you need to specify enough memory in your DATA DIVISION to store those multiple records? Any idea what sorting algorithm is used? (E.g. mergesort, bublesort, quicksort, insertion sort, etc.), or is that implementation defined? - Oliver
Post Follow-up to this message"Internal SORT - class 101" <G> pre-02 Standard: 1) The SORT statement takes one OR more input files in any order (random or already sorted) and creates a single output file (or multiple copies of the same file) in the order determined by the set of SORT keys specified in the SORT statement. The position of those keys is determined by the record layout un der the SD which is a VIRTUAL file for records "in transit" from input to output . 2) If NO input or output procedure is specified, then all the I/O - OPEN, RE AD, SORT, WRITE, CLOSE is done "automagically" - using the sort specification of the SORT statement's ascending/descending KEYS. 3) If an Input procedure is specified, then the "initial-type" output, e.g. OPEN and READs of input files must be "manually" coded by the programmer. This allows for getting records from "odd" places like SQL, IMS (or other databas e) or for using editing logic (such as combining multiple input records from multiple files into a single "logical" file). (See the RELEASE statement) 4) If an output procedure is specified, then the "logical" records are retur ned to the program record by record in "sorted" order - for post-sort processing . Some programs (for example) just want to create a report (for example using report writer GENERATE statement in an Output procedure) and never actually write out the sorted record. Others want to eliminate certain records from the output, but need to see them in sorted order to determine which ones to eliminate. (See RETURN statement) 5) A single SORT statement may have an Input procedure, an Output procedure, neither, or both. 6) The procedure mentioned in an Input/Output procedure is only executed ONC E. Therefore, the program logic needs to handle the "loop" thru all the records (or as many as they want to process). In the '74 Standard (and before) these ne eded to be SECTIONS and couldn't get "out" of the section. This led to the REQUIREMENT for using a GO TO statement in order to "loop". The SECTION restriction went away with the '85 Standard, so now Input/Output procedures often (usually) include a PERFORM UNTIL statement. 7) For most COBOL compilers that I know of, there are also "special register s". The most common one that I know about is the SORT-RETURN (which is common fo r vendors with a RETURN-CODE special register). This shows "0" when the SORT is successful and "16" when it isn't. It can also be SET to "16" in the middle of an input/output procedure to terminate (with error indication) an internal s ort. 8) The '02 Standard (and some vendors as an extension to the '85 Standard) include a "Table Sort" which allows for the sorting of elements in a group ( data division) structure that has an OCCURS in or under it. *** Does this get you started? *** P.S. For IBM mainframe customers, there are two "separate" topics that ofte n come up when talking about SORTs and COBOL. 1) When to use an Internal COBOL SORT versus when to use an external (utilit y) SORT. 2) What is the difference between a COBOL internal SORT with Input/Output procedures and using an "E15" (input) or "E35" SORT exit written in COBOL. I won't go into either of these here, but thought I should mention them any way. When/If you are working with SORTs in an IBM mainframe environment, there ar e LOTS of things to think about and learn. (Restart, tuning, etc) *** P.P.S. The SORT statement is often talked about in relationship to the MERG E statement which is BASICALLY the same thing, but with "guaranteed to already be sorted" input. -- Bill Klein wmklein <at> ix.netcom.com "Oliver Wong" <owong@castortech.com> wrote in message news:eAlPf.12509$Cp4.11354@edtnps90... > I know there was a thread posted here earlier about "SORT", but it seem s to > be referring to "SORT cards", and I think that's something different than the > "SORT statement" I'm trying to learn more about. Here's the syntax of the SORT > statement: > > SORT filename-1 > ON ASCENDING KEY data-name-1 > ON DESCENDING KEY data-name-2 > ... > > WITH DUPLICATES IN ORDER > COLLATING SEQUENCE IS alphabet-name > > INPUT PROCEDURE IS procedure-name-1a THROUGH procedure-name-1z > USING filename-2a, filename-2b, ... > > OUTPUT PROCEDURE IS procedure-name-2a THROUGH procedure-name-2z > GIVING filename-3a, filename-3b, ... > > where the ... isn't literally part of the syntax, but means that the previ ous > terms can be repeated arbitrarily many times. > > So first of all, what does this statement do? Do you provide it with an > unsorted file (filename-1) and it sorts it? Why does the INPUT PROCEDURE > clause accept multiple filenames, and how do each of these files affect th e > process, and similarly for the OUTPUT PROCEDURE? > > I'm guessing it's something like the INPUT PROCEDUREs are called to loa d > the records into memory, and the SORT statement does some comparison based on > the COLLATING SEQUENCE, and then uses the OUTPUT PROCEDUREs to save the > records somewhere; traditionally, the input and output comes from files, b ut > this is not nescessarily so. Am I still on track? > > Is the reading of records into memory done 1 record at a time, or does it > load multiple records? If it's one at a time, wouldn't you need at least 2 > records in memory to do a meaningful comparison to decide on a sort order? If > it's multiple records, wouldn't you need to specify enough memory in your DATA > DIVISION to store those multiple records? > > Any idea what sorting algorithm is used? (E.g. mergesort, bublesort, > quicksort, insertion sort, etc.), or is that implementation defined? > > - Oliver
Post Follow-up to this messageWhat level of detail do you want? Do you have a COBOL reference manual handy? I think that might help! Even the draft of the 2008 standard might prove useful here. What may be confusing you is that INPUT PROCEDURE and USING are *mutually exclusive* in the same SORT statement, as are OUTPUT PROCEDURE and GIVING. You can have SORT ... USING ... GIVING, you can have SORT ... INPUT PROCEDURE ... GIVING, you can have SORT ... USING ... OUTPUT PROCEDURE, and you can have SORT ... INPUT PROCEDURE ... OUTPUT PROCEDURE. But you *can't* have INPUT PROCEDURE ... USING or OUTPUT PROCEDURE ... GIVING. Makes no sense. The syntax diagram should show curly braces around INPUT PROCEDURE / USING and around OUTPUT PROCEDURE / GIVING. If it doesn't, complain to your vendor. When an INPUT PROCEDURE is used, it is called ONCE by the SORT mechanism, and is expected to execute a RELEASE statement for each record it wishes to pass to the SORT mechanism. When an OUTPUT PROCEDURE is used, it is called ONCE by the SORT mechanism (ostensibly) after the sorting is done, and is expected to execute a RETURN statement for each record it wishes to retrieve from the SORT mechanism into the program. In this latter case, the SORT statement terminates when the OUTPUT PROCEDURE terminus is reached. When a USING clause is specified, the records in the specified files are passed to the sort mechanism, and when the last record of the last file has been passed to the sort mechanism, the sorting process begins. When a GIVING clause is specified, the sorted records are written into the designated file, and when the sorted records are exhausted, the SORT statement terminates. The order in which the records are sorted is FIRST by data-name-1 in ASCENDING order, and then for records that have duplicate values in data-name-1, they are sorted in DESCENDING order by data-name-2. If DUPLICATES IN ORDER is specified, the order of those records that are duplicate on *all* keys is retained in the output; if it isn't, the order in which those duplicates appear isn't defined. That's true whether the records are passed to the sort via RELEASE or are contained in a USING file. COLLATING SEQUENCE specifies the collating sequence by which the alphanumeric comparisons necessary to accomplish the sort are performed. > I'm guessing it's something like the INPUT PROCEDUREs are called to load > the records into memory, and the SORT statement does some comparison based > on the COLLATING SEQUENCE, and then uses the OUTPUT PROCEDUREs to save the > records somewhere; traditionally, the input and output comes from files, > but this is not nescessarily so. Am I still on track? Each time a RELEASE statement in an INPUT PROCEDURE is executed, a record is passed to the SORT. Each time a RETURN statement in an OUTPUT PROCEDURE is executed, a record is passed back to the program. It's expected that the INPUT PROCEDURE pass all the records to the sort before reaching its terminus, likewise, it is expected that the OUTPUT PROCEDURE retrieve all the records from the SORT before reching its terminus. > Is the reading of records into memory done 1 record at a time, or does > it load multiple records? If it's one at a time, wouldn't you need at > least 2 records in memory to do a meaningful comparison to decide on a > sort order? If it's multiple records, wouldn't you need to specify enough > memory in your DATA DIVISION to store those multiple records? Not sure what you mean by this. For INPUT PROCEDURE sorts, records are passed to the Great Sorter in the Sky one at a time through RELEASE statements; for USING sorts, they're read by the system in whatever gulps the implementor decides are appropriate. How many records actually end up in memory at any given time is an implementor issue. > Any idea what sorting algorithm is used? (E.g. mergesort, bublesort, > quicksort, insertion sort, etc.), or is that implementation defined? It's implementor defined, so long as the results are as specified in the language standard. The *language* doesn't care how the sort is done -- in theory, from a language standpoint it could be done by putting the records out on some sort of unit-record device and having a bunch of Discalced Cistercians lay them out on their refectory table in order before reading them back in again. It'd probably be easiest if you started with a simple SORT SDNAME-1 ASCENDING KEY KEY-1 USING INPUTFILE GIVING OUTPUTFILE. and work up in complexity from there. -Chuck Stevens
Post Follow-up to this messageOliver Wong wrote: > INPUT PROCEDURE IS procedure-name-1a THROUGH procedure-name-1z > USING filename-2a, filename-2b, ... > > OUTPUT PROCEDURE IS procedure-name-2a THROUGH procedure-name-2z > GIVING filename-3a, filename-3b, ... > > where the ... isn't literally part of the syntax, but means that the > previous terms can be repeated arbitrarily many times. The syntax signals that you failed to notice are the ones that specify that 'INPUT PROCEDURE' and 'USING' are alternates - use one or the other, not both. Same for 'OUTPUT PROCEDURE' and 'GIVING'. The statement will sort records either from a file(s) or from the program creating and RELEASEing records and will either output to a file(s) or allow a procedure to RETURN these in order.
Post Follow-up to this messageThank you, I have a much better understanding of how the statements works now. I was under the incorrect impression that the INPUT PROCEDURE was called multiple times, once for each record that was to be retrieved, and similarly for the OUPUT PROCEDURE. And yes, I thought the USING clause modified the INPUT PROCEDURE, rather that was an exclusive alternative to the INPUT PROCEDURE (and similarly for the OUTPUT PROCEDURE). Just to be sure I understand the sequence of steps: (*) The INPUT PROCEDURE (assuming one is present) gets called once. (*) Within this procedure, the RELEASE statement will get executed multiple times, once for every record that is to participate in the sorting. These records go to some magical place where the actual sorting will occur. (*) The OUTPUT PROCEDURE is them called once. (*) Within this procedure, the RETURN statement will get executed multiple times. Let's say it appears as "RETURN filename-1 INTO record-1". After each execution of the RETURN statement, the contents of record-1 will change, representing each of the sorted records, and the records will show up in sorted order. (*) Control then returns to the next statement after the SORT statement that was just executed. - Oliver
Post Follow-up to this messagePretty much. If you encounter AT END on a RETURN it's a good idea to end up at the OUTPUT PROCEDURE terminus without executing another RETURN, otherwise you'll end up with the fatal EC-SORT-MERGE-RETURN exception. There's no onus against reaching the OUTPUT PROCEDURE terminus *without* having RETURNed all the records, though, as far as I know. The expected behavior of SORT -- the basic phases and the relationship between input and output -- is pretty well detailed in the standard, and should be similarly specified in the reference materials for individual implementations as well. -Chuck Stevens "Oliver Wong" <owong@castortech.com> wrote in message news:kKmPf.12530$Cp4.1139@edtnps90... > Thank you, > > I have a much better understanding of how the statements works now. I > was under the incorrect impression that the INPUT PROCEDURE was called > multiple times, once for each record that was to be retrieved, and > similarly for the OUPUT PROCEDURE. And yes, I thought the USING clause > modified the INPUT PROCEDURE, rather that was an exclusive alternative to > the INPUT PROCEDURE (and similarly for the OUTPUT PROCEDURE). > > Just to be sure I understand the sequence of steps: > > (*) The INPUT PROCEDURE (assuming one is present) gets called once. > (*) Within this procedure, the RELEASE statement will get executed > multiple times, once for every record that is to participate in the > sorting. These records go to some magical place where the actual sorting > will occur. > (*) The OUTPUT PROCEDURE is them called once. > (*) Within this procedure, the RETURN statement will get executed multiple > times. Let's say it appears as "RETURN filename-1 INTO record-1". After > each execution of the RETURN statement, the contents of record-1 will > change, representing each of the sorted records, and the records will show > up in sorted order. > (*) Control then returns to the next statement after the SORT statement > that was just executed. > > - Oliver
Post Follow-up to this messageOn Tue, 7 Mar 2006 13:36:32 -0800, "Chuck Stevens" <charles.stevens@unisys.com> wrote: >Pretty much. If you encounter AT END on a RETURN it's a good idea to end u p >at the OUTPUT PROCEDURE terminus without executing another RETURN, otherwis e >you'll end up with the fatal EC-SORT-MERGE-RETURN exception. There's no >onus against reaching the OUTPUT PROCEDURE terminus *without* having >RETURNed all the records, though, as far as I know. One thing that I've seen beginners do - drop through from the sort command to the input or output procedure when done. Make sure your sort parts are completely separate.
Post Follow-up to this messageOliver Wong wrote: > > Any idea what sorting algorithm is used? (E.g. mergesort, > bublesort, quicksort, insertion sort, etc.), or is that > implementation defined? The algorithm used is determined by the writer of the SORT code - it may or may not be the same people who wrote the compiler. Further, the algorithm is (a trade) secret. That's why they get paid the big bucks: for a fast sort program.
Post Follow-up to this messageIn the case of Unisys MCP, the actual ordering of the records is performed by the operating system. The code generated by the '74 compiler may generate code to call this routine directly (and for USING/GIVING, it may generate INPUT or OUTPUT procedures behind the user's back or it may actually pass the files), or it may generate code to call an entry point in a library that performs some execution-time optimization shenanigans and then calls the sort routine (again, using files or doing the equivalent of INPUT and OUTPUT procedures). The code generated by the '85 compiler uses the library exclusively. In either case, how the MCP accomplishes the reordering of the records is None Of The Compiler's Business, so long as the records end up in the specified order. -Chuck Stevens "HeyBub" <heybubNOSPAM@gmail.com> wrote in message news:120s117hk1juj97@news.supernews.com... > Oliver Wong wrote: > > The algorithm used is determined by the writer of the SORT code - it may > or may not be the same people who wrote the compiler. Further, the > algorithm is (a trade) secret. That's why they get paid the big bucks: for > a fast sort program. >
Post Follow-up to this messageOliver Wong wrote: > I know there was a thread posted here earlier about "SORT", but it > seems to be referring to "SORT cards", and I think that's something > different than the "SORT statement" I'm trying to learn more about. > Here's the syntax of the SORT statement: Oliver, Told you, I don't do SORTs. but as it's you, the brief reference by Jerome Garfunkel to SORTs in his 'COBOL '85 Example Book'. Hope it is of some help :- Sort Sort-Work-File on ascending key work-order-number with duplicates in order input procedure is WORK-ORDER-VALIDATION-PROCESS giving Daily-Work-Orders-Seq Daily-Work-Orders-Relative Daily-Work-Orders-Indexed Jimmy
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.