Code Comments
Programming Forum and web based access to our favorite programming groups.This is a re=posting of the sorts demo with one added test: inserting record
s
into an indexed file and reading them back in sequence. Jerry Mouse said thi
s
technique is "efficient". In FACT, it ran 15 times slower than a Cobol sort
and
more than 100 times slower than a fast sort.
* I'll take Sorts And Trees for a thousand, Alex
*
* These two programs -- Sorts and Trees -- demonstrate Cobol Features
* overlooked by most programmers, while running timing tests on
* sort algorithms.
*
* Glossary.
* We will see three data structures:
*
* Table An array defined with Occurs. Same as a sequential file.
*
* List Each entry has a pointer or index containing the address
* of the next entry, and optionally the previous. Same as
* a 'random' file.
*
* Tree Like a family tree, where each node contains one data
* value and pointers to up to two children, named Left,
* containing entries less than the node's data, and Right.
* Nodes with fewer than two children are called Leaves.
* Same as an indexed file.
*
* Sorts program
*
* Feature: Nested Programs
*
* Cobol is often criticized for having 'an ocean of data' in
* working-storage that is globally accessible to all parts of the
* program. It is also criticized for having overly large monolithic
* programs. Nested programs, seen below, are one answer to that
* objection; we will see another in Trees. They are written like
* external called programs, but they are in the same source file
* with the caller. If they were separately compiled, there would
* be additional administrative work to list them on program
* inventories, promote them through Change Management, perhaps
* list them as dependents in make files.
*
* Nested programs have their own private working-storage and lack
* access to their parent(s)' working-storage unless passed as
* a parameter or declared Global. They have been in the ANSI
* Cobol Standard since 1985.
*
* Test 0. Indexed file sort
*
* Inserts records into an indexed file, then reads them back in sequence.
* This is included as a bad example, the wrong way to sort data.
* It runs 15 times slower than a Cobol sort.
*
* Test 1. Cobol file sort
*
* This is a familiar Cobol sort with input and output procedures,
* release and return records. No Features here, it is shown to provide a
* timing baseline.
*
* Test 2. Micro Focus table sort
*
* Feature: Table sort
*
* Micro Focus provides an extension allowing you sort a table in memory.
* When we time it, we find its speed is almost identical to the file
* sort in test 1, and half the speed it 'should' be as shown in test 3.
* Apparently, under the covers it is releasing records to and returning
* records from the file sort. This was an extension to the 1985 Standard,
* is included in the 2002 Standard.
*
* Test 3. Quick sort
*
* Feature: Calling C language functions from Cobol
*
* This test calls qsort() in the Standard C Library, and directs it to
* call memcmp().
*
* How do you tell Cobol to use the C calling convention? Not necessary,
* Micro Focus uses the C convention by default. How do you tell the
* linker (loader) to look in C libraries? Not necessary. We stopped
* doing static binding (link edit) many years ago. Binding is now done
* at execution time by a program named "ld", which follows
* LD_LIBRARY_PATH. The C Standard Library is in that path by default. In
* short, you don't have to do anything special. You can see libraries
* that will be loaded at execution time by typing "ldd sorts".
*
* C libraries contain a plethora of useful functions. See
* http://www.tcnj.edu/~cs/doc/CStdLib.html
* also
* http://www.bebits.com/app/2917
* also
* a hundred others in /usr/lib, many defined in Standards
*
* The speed of quick sort is said to be proportional to N log2(N), where
* N is the number of entries and log2 is the base 2 logarithm. Most
* high-speed algorithms run at this speed, for example Shell, Heap and
* Comb sorts. Low-speed sorts such as Bubble run at N*N, are suitable
* only for small values of N, say less than 100.
*
* Tests 4 & 5. Radix sort followed by Insertion sort
*
* Features: The radix algorithm
* Linked lists using indexes
*
* When we had to sort a large number of punched cards, we put them into
* approximate sequence by sorting on the first few columns with a card
* sorter. Next we put them through a machine called a collator, which
* pulled cards out of order, typically 5-10%. We sorted those cards on
* the full key, then used a collator to merge them into the big deck.
*
* Would you believe the card sorting algorithm, written in Cobol, is
* five times faster than the best high-speed algorithms written in C,
* and about ten times faster than the sorts delivered by Micro Focus?
*
* The algorithm inspects one 'column' at a time, right to left, tossing
* 'cards' into 'pockets' by linking the 'card' to what had previously
* been the last in the 'pocket'. At the end of each pass, it 'gathers'
* the pocket strings into a single string ('deck') by linking the last
* 'card' in a 'pocket' to the first 'card' in the next 'pocket'.
*
* A 'column' can be any convenient size -- one bit, one byte or one
* word. The demo program defines a column as two bytes, 16 bits. This
* requires 65,536 'pockets', of which only a few thousand are actually
* used.
*
* When cards are approximately in order, the program 'merges' the ones
* out of order into place with an Insertion Sort. As a primary
* algorithm, Insertion is slow, close to N*N. But when the entries are
* almost in sequence, it is quite fast. It almost always puts them in
* order with one pass, then makes a second pass to verify that the job
* is complete. It takes a third pass only when there are two consecutive
* steps down. In the 50,000 item test, that doesn't happen, it runs in
* two passes. If radix had sorted on seven bytes rather than six, they
* would have been in perfect order. Insertion would have taken only one
* pass.
*
* The Radix-Insertion algorithm speed is proportional to kN + 2N+, where
* k is the number of columns sorted by Radix (in this case 3) and the +
* following 2N is close-range rearrangements by Insertion. In THEORY,
* the relative speeds of test 3 vs. test 5 should be 50,000 *
* log2(50,000) = 785,000 vs. 3*50,000 + 2*50,000+ =~ 300,000. The ratio
* is 2.6. In PRACTICE, it is 5.0 times as fast. As usual, the devil is
* in the details. If the test program had used subscripts rather than
* indexes, its speed would have been cut in half. Not because indexes
* are that much faster, because every subscript operation generates a
* call to the runtime whereas index operations produce inline code (seen
* via the now commented ASM options).
*
* [If your calculator doesn't have log2, take ln (base e) and divide by .69,
* which is ln(2).]
*
* To compile this "cob -x sorts.cbl". -x produces a Unix executable.
* To execute "sorts".
*
* Continued in trees.cbl
*
* ---------------------------------------------------------------------
$SET SOURCEFORMAT"FREE"
$SET NOBOUND OPT"2" NOTRUNC IBMCOMP NOCHECK
* $SET ASM ASMLIST SOURCEASM
identification division.
program-id. Sorts.
author. Robert Wagner.
* Compare speed of four sorts.
* Results: Sun SPARC, 1.8 GHz, 50,000 30-byte entries, time in
* seconds for 10 iterations.
* --- ratio ---
* 2. 3. 4. 5.
* 0. Indexed file sort 55.4 14.6 22.2 110 139
* 1. Cobol file sort 3.7 1.0 1.5 7.4 9.3
* 2. Micro Focus table sort 3.8 1.5 7.6 9.5
* 3. qsort() 2.5 5.0 6.3
* 4. radix-insertion .5 1.3
* 5. radix to linked list .4
*
* Trees.cbl contains an expanded version of these test results.
working-storage section.
01 test-data GLOBAL.
05 radix-columns comp pic s9(02).
05 test-number comp pic s9(02).
05 card occurs 50001 indexed *> the data
x-n first-n last-n previous-n-0 previous-n-1 previous-n-2 x-last.
10 sort-key.
15 key-column occurs 15 indexed x-c comp-x pic xx.
10 next-n index.
10 next-n-1 index.
01 init-data.
05 init-columns value zero comp pic s9(02).
05 init-number value zero comp pic s9(02).
05 init-card occurs 50001 indexed i-n i-next i-last.
10 init-key-1 pic v9(18).
10 init-key-2 pic v9(12).
10 init-next index.
10 init-next-1 index.
01 timer-variables.
05 repeat-factor value 10 comp pic s9(09).
05 previous-key pic x(30).
05 card-count comp pic s9(09).
05 start-time.
10 start-hours pic 9(02).
10 start-minutes pic 9(02).
10 start-seconds pic 9(02).
10 start-hundredths pic 9(02).
05 end-time.
10 end-hours pic 9(02).
10 end-minutes pic 9(02).
10 end-seconds pic 9(02).
10 end-hundredths pic 9(02).
05 elapsed-time pic z(04).9.
procedure division.
perform construct-data
display '0. Indexed file sort' with no advancing
move 0 to init-number
move 2 to repeat-factor
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'indexed-file-sort'
end-perform
perform timer-off
perform verify-sort
display '1. Cobol file sort' with no advancing
move 1 to init-number
move 10 to repeat-factor
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'cobol-sort'
end-perform
perform timer-off
perform verify-sort
display '2. Micro Focus sort' with no advancing
move 2 to init-number
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'mf-sort'
end-perform
perform timer-off
perform verify-sort
display '3. quick sort N log2(N)' with no advancing
move 3 to init-number
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'quick-sort'
end-perform
perform timer-off
perform verify-sort
display '4. radix-insertion kN + 2N+' with no advancing
move 4 to init-number
move 3 to init-columns
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'radix-sort'
end-perform
perform timer-off
move zero to init-columns
perform verify-sort
display '5. radix-insertion to linked list' with no advancing
move 5 to init-number
move 3 to init-columns
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'radix-sort'
end-perform
perform timer-off
goback
. construct-data.
set x-last, i-last to 50001
perform varying i-n from 1 by 1 until i-n = i-last
compute init-key-1 (i-n) = function random
compute init-key-2 (i-n) = function random
set i-next to i-n
set i-next up by 1
set init-next (i-n) to i-next
set init-next-1 (i-n) to i-last
end-perform
move high-values to init-card (i-last)
. initialize-test-data.
move init-data to test-data
set first-n to 1
. verify-sort.
move zero to previous-key, card-count
set x-n to first-n
if x-n equal to x-last
display 'test failed, no first'
end-if
perform until x-n equal to x-last
if sort-key (x-n) less than previous-key
display 'test failed '
previous-key space sort-key (x-n)
set x-n to x-last
else
move sort-key (x-n) to previous-key
if init-columns not equal to zero
set x-n to next-n (x-n)
else
set x-n up by 1
end-if
add 1 to card-count
end-if
end-perform
if card-count not equal to 50000
display 'test failed, count is ' card-count
end-if
. timer-on.
accept start-time from time
. timer-off.
accept end-time from time
compute elapsed-time rounded =
((((((((end-hours * 60) +
end-minutes) * 60) +
end-seconds) * 100) +
end-hundredths) -
((((((start-hours * 60) +
start-minutes) * 60) +
start-seconds) * 100) +
start-hundredths)) / 100)
* 10 / repeat-factor
display elapsed-time ' seconds'
Post Follow-up to this messagequote:
Originally posted by Robert Wagner This is a re=posting of the sorts demo with one added test: inserting record s into an indexed file and reading them back in sequence. Jerry Mouse said thi s technique is "efficient". In FACT, it ran 15 times slower than a Cobol sort and more than 100 times slower than a fast sort. * I'll take Sorts And Trees for a thousand, Alex * * These two programs -- Sorts and Trees -- demonstrate Cobol Features * overlooked by most programmers, while running timing tests on * sort algorithms. * * Glossary. * We will see three data structures: * * Table An array defined with Occurs. Same as a sequential file. * * List Each entry has a pointer or index containing the address * of the next entry, and optionally the previous. Same as * a 'random' file. * * Tree Like a family tree, where each node contains one data * value and pointers to up to two children, named Left, * containing entries less than the node's data, and Right. * Nodes with fewer than two children are called Leaves. * Same as an indexed file. * * Sorts program * * Feature: Nested Programs * * Cobol is often criticized for having 'an ocean of data' in * working-storage that is globally accessible to all parts of the * program. It is also criticized for having overly large monolithic * programs. Nested programs, seen below, are one answer to that * objection; we will see another in Trees. They are written like * external called programs, but they are in the same source file * with the caller. If they were separately compiled, there would * be additional administrative work to list them on program * inventories, promote them through Change Management, perhaps * list them as dependents in make files. * * Nested programs have their own private working-storage and lack * access to their parent(s)' working-storage unless passed as * a parameter or declared Global. They have been in the ANSI * Cobol Standard since 1985. * * Test 0. Indexed file sort * * Inserts records into an indexed file, then reads them back in sequence. * This is included as a bad example, the wrong way to sort data. * It runs 15 times slower than a Cobol sort. * * Test 1. Cobol file sort * * This is a familiar Cobol sort with input and output procedures, * release and return records. No Features here, it is shown to provide a * timing baseline. * * Test 2. Micro Focus table sort * * Feature: Table sort * * Micro Focus provides an extension allowing you sort a table in memory. * When we time it, we find its speed is almost identical to the file * sort in test 1, and half the speed it 'should' be as shown in test 3. * Apparently, under the covers it is releasing records to and returning * records from the file sort. This was an extension to the 1985 Standard, * is included in the 2002 Standard. * * Test 3. Quick sort * * Feature: Calling C language functions from Cobol * * This test calls qsort() in the Standard C Library, and directs it to * call memcmp(). * * How do you tell Cobol to use the C calling convention? Not necessary, * Micro Focus uses the C convention by default. How do you tell the * linker (loader) to look in C libraries? Not necessary. We stopped * doing static binding (link edit) many years ago. Binding is now done * at execution time by a program named "ld", which follows * LD_LIBRARY_PATH. The C Standard Library is in that path by default. In * short, you don't have to do anything special. You can see libraries * that will be loaded at execution time by typing "ldd sorts". * * C libraries contain a plethora of useful functions. See * http://www.tcnj.edu/~cs/doc/CStdLib.html * also * http://www.bebits.com/app/2917 * also * a hundred others in /usr/lib, many defined in Standards * * The speed of quick sort is said to be proportional to N log2(N), where * N is the number of entries and log2 is the base 2 logarithm. Most * high-speed algorithms run at this speed, for example Shell, Heap and * Comb sorts. Low-speed sorts such as Bubble run at N*N, are suitable * only for small values of N, say less than 100. * * Tests 4 & 5. Radix sort followed by Insertion sort * * Features: The radix algorithm * Linked lists using indexes * * When we had to sort a large number of punched cards, we put them into * approximate sequence by sorting on the first few columns with a card * sorter. Next we put them through a machine called a collator, which * pulled cards out of order, typically 5-10%. We sorted those cards on * the full key, then used a collator to merge them into the big deck. * * Would you believe the card sorting algorithm, written in Cobol, is * five times faster than the best high-speed algorithms written in C, * and about ten times faster than the sorts delivered by Micro Focus? * * The algorithm inspects one 'column' at a time, right to left, tossing * 'cards' into 'pockets' by linking the 'card' to what had previously * been the last in the 'pocket'. At the end of each pass, it 'gathers' * the pocket strings into a single string ('deck') by linking the last * 'card' in a 'pocket' to the first 'card' in the next 'pocket'. * * A 'column' can be any convenient size -- one bit, one byte or one * word. The demo program defines a column as two bytes, 16 bits. This * requires 65,536 'pockets', of which only a few thousand are actually * used. * * When cards are approximately in order, the program 'merges' the ones * out of order into place with an Insertion Sort. As a primary * algorithm, Insertion is slow, close to N*N. But when the entries are * almost in sequence, it is quite fast. It almost always puts them in * order with one pass, then makes a second pass to verify that the job * is complete. It takes a third pass only when there are two consecutive * steps down. In the 50,000 item test, that doesn't happen, it runs in * two passes. If radix had sorted on seven bytes rather than six, they * would have been in perfect order. Insertion would have taken only one * pass. * * The Radix-Insertion algorithm speed is proportional to kN + 2N+, where * k is the number of columns sorted by Radix (in this case 3) and the + * following 2N is close-range rearrangements by Insertion. In THEORY, * the relative speeds of test 3 vs. test 5 should be 50,000 * * log2(50,000) = 785,000 vs. 3*50,000 + 2*50,000+ =~ 300,000. The ratio * is 2.6. In PRACTICE, it is 5.0 times as fast. As usual, the devil is * in the details. If the test program had used subscripts rather than * indexes, its speed would have been cut in half. Not because indexes * are that much faster, because every subscript operation generates a * call to the runtime whereas index operations produce inline code (seen * via the now commented ASM options). * * [If your calculator doesn't have log2, take ln (base e) and divide by .69, * which is ln(2).] * * To compile this "cob -x sorts.cbl". -x produces a Unix executable. * To execute "sorts". * * Continued in trees.cbl * * --------------------------------------------------------------------- $SET SOURCEFORMAT"FREE" $SET NOBOUND OPT"2" NOTRUNC IBMCOMP NOCHECK * $SET ASM ASMLIST SOURCEASM identification division. program-id. Sorts. author. Robert Wagner. * Compare speed of four sorts. * Results: Sun SPARC, 1.8 GHz, 50,000 30-byte entries, time in * seconds for 10 iterations. * --- ratio --- * 2. 3. 4. 5. * 0. Indexed file sort 55.4 14.6 22.2 110 139 * 1. Cobol file sort 3.7 1.0 1.5 7.4 9.3 * 2. Micro Focus table sort 3.8 1.5 7.6 9.5 * 3. qsort() 2.5 5.0 6.3 * 4. radix-insertion .5 1.3 * 5. radix to linked list .4 * * Trees.cbl contains an expanded version of these test results. working-storage section. 01 test-data GLOBAL. 05 radix-columns comp pic s9(02). 05 test-number comp pic s9(02). 05 card occurs 50001 indexed *> the data x-n first-n last-n previous-n-0 previous-n-1 previous-n-2 x-last. 10 sort-key. 15 key-column occurs 15 indexed x-c comp-x pic xx. 10 next-n index. 10 next-n-1 index. 01 init-data. 05 init-columns value zero comp pic s9(02). 05 init-number value zero comp pic s9(02). 05 init-card occurs 50001 indexed i-n i-next i-last. 10 init-key-1 pic v9(18). 10 init-key-2 pic v9(12). 10 init-next index. 10 init-next-1 index. 01 timer-variables. 05 repeat-factor value 10 comp pic s9(09). 05 previous-key pic x(30). 05 card-count comp pic s9(09). 05 start-time. 10 start-hours pic 9(02). 10 start-minutes pic 9(02). 10 start-seconds pic 9(02). 10 start-hundredths pic 9(02). 05 end-time. 10 end-hours pic 9(02). 10 end-minutes pic 9(02). 10 end-seconds pic 9(02). 10 end-hundredths pic 9(02). 05 elapsed-time pic z(04).9. procedure division. perform construct-data display '0. Indexed file sort' with no advancing move 0 to init-number move 2 to repeat-factor perform timer-on perform repeat-factor times perform initialize-test-data call 'indexed-file-sort' end-perform perform timer-off perform verify-sort display '1. Cobol file sort' with no advancing move 1 to init-number move 10 to repeat-factor perform timer-on perform repeat-factor times perform initialize-test-data call 'cobol-sort' end-perform perform timer-off perform verify-sort display '2. Micro Focus sort' with no advancing move 2 to init-number perform timer-on perform repeat-factor times perform initialize-test-data call 'mf-sort' end-perform perform timer-off perform verify-sort display '3. quick sort N log2(N)' with no advancing move 3 to init-number perform timer-on perform repeat-factor times perform initialize-test-data call 'quick-sort' end-perform perform timer-off perform verify-sort display '4. radix-insertion kN + 2N+' with no advancing move 4 to init-number move 3 to init-columns perform timer-on perform repeat-factor times perform initialize-test-data call 'radix-sort' end-perform perform timer-off move zero to init-columns perform verify-sort display '5. radix-insertion to linked list' with no advancing move 5 to init-number move 3 to init-columns perform timer-on perform repeat-factor times perform initialize-test-data call 'radix-sort' end-perform perform timer-off goback . construct-data. set x-last, i-last to 50001 perform varying i-n from 1 by 1 until i-n = i-last compute init-key-1 (i-n) = function random compute init-key-2 (i-n) = function random set i-next to i-n set i-next up by 1 set init-next (i-n) to i-next set init-next-1 (i-n) to i-last end-perform move high-values to init-card (i-last) . initialize-test-data. move init-data to test-data set first-n to 1 . verify-sort. move zero to previous-key, card-count set x-n to first-n if x-n equal to x-last display 'test failed, no first' end-if perform until x-n equal to x-last if sort-key (x-n) less than previous-key display 'test failed ' previous-key space sort-key (x-n) set x-n to x-last else move sort-key (x-n) to previous-key if init-columns not equal to zero set x-n to next-n (x-n) else set x-n up by 1 end-if add 1 to card-count end-if end-perform if card-count not equal to 50000 display 'test failed, count is ' card-count end-if . timer-on. accept start-time from time . timer-off. accept end-time from time compute elapsed-time rounded = ((((((((end-hours * 60) + end-minutes) * 60) + end-seconds) * 100) + end-hundredths) - ((((((start-hours * 60) + start-minutes) * 60) + start-seconds) * 100) + start-hundredths)) / 100) * 10 / repeat-factor display elapsed-time ' seconds'
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.