Home > Archive > Cobol > July 2004 > Demo: Sorts (revised)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Demo: Sorts (revised)
|
|
| Robert Wagner 2004-06-25, 6:50 pm |
| This is a re=posting of the sorts demo with one added test: inserting records
into an indexed file and reading them back in sequence. Jerry Mouse said this
technique is "efficient". In FACT, it ran 15 times slower than a Cobol sort and
more than 100 times slower than a fast sort.
* I'll take Sorts And Trees for a thousand, Alex
*
* These two programs -- Sorts and Trees -- demonstrate Cobol Features
* overlooked by most programmers, while running timing tests on
* sort algorithms.
*
* Glossary.
* We will see three data structures:
*
* Table An array defined with Occurs. Same as a sequential file.
*
* List Each entry has a pointer or index containing the address
* of the next entry, and optionally the previous. Same as
* a 'random' file.
*
* Tree Like a family tree, where each node contains one data
* value and pointers to up to two children, named Left,
* containing entries less than the node's data, and Right.
* Nodes with fewer than two children are called Leaves.
* Same as an indexed file.
*
* Sorts program
*
* Feature: Nested Programs
*
* Cobol is often criticized for having 'an ocean of data' in
* working-storage that is globally accessible to all parts of the
* program. It is also criticized for having overly large monolithic
* programs. Nested programs, seen below, are one answer to that
* objection; we will see another in Trees. They are written like
* external called programs, but they are in the same source file
* with the caller. If they were separately compiled, there would
* be additional administrative work to list them on program
* inventories, promote them through Change Management, perhaps
* list them as dependents in make files.
*
* Nested programs have their own private working-storage and lack
* access to their parent(s)' working-storage unless passed as
* a parameter or declared Global. They have been in the ANSI
* Cobol Standard since 1985.
*
* Test 0. Indexed file sort
*
* Inserts records into an indexed file, then reads them back in sequence.
* This is included as a bad example, the wrong way to sort data.
* It runs 15 times slower than a Cobol sort.
*
* Test 1. Cobol file sort
*
* This is a familiar Cobol sort with input and output procedures,
* release and return records. No Features here, it is shown to provide a
* timing baseline.
*
* Test 2. Micro Focus table sort
*
* Feature: Table sort
*
* Micro Focus provides an extension allowing you sort a table in memory.
* When we time it, we find its speed is almost identical to the file
* sort in test 1, and half the speed it 'should' be as shown in test 3.
* Apparently, under the covers it is releasing records to and returning
* records from the file sort. This was an extension to the 1985 Standard,
* is included in the 2002 Standard.
*
* Test 3. Quick sort
*
* Feature: Calling C language functions from Cobol
*
* This test calls qsort() in the Standard C Library, and directs it to
* call memcmp().
*
* How do you tell Cobol to use the C calling convention? Not necessary,
* Micro Focus uses the C convention by default. How do you tell the
* linker (loader) to look in C libraries? Not necessary. We stopped
* doing static binding (link edit) many years ago. Binding is now done
* at execution time by a program named "ld", which follows
* LD_LIBRARY_PATH. The C Standard Library is in that path by default. In
* short, you don't have to do anything special. You can see libraries
* that will be loaded at execution time by typing "ldd sorts".
*
* C libraries contain a plethora of useful functions. See
* http://www.tcnj.edu/~cs/doc/CStdLib.html
* also
* http://www.bebits.com/app/2917
* also
* a hundred others in /usr/lib, many defined in Standards
*
* The speed of quick sort is said to be proportional to N log2(N), where
* N is the number of entries and log2 is the base 2 logarithm. Most
* high-speed algorithms run at this speed, for example Shell, Heap and
* Comb sorts. Low-speed sorts such as Bubble run at N*N, are suitable
* only for small values of N, say less than 100.
*
* Tests 4 & 5. Radix sort followed by Insertion sort
*
* Features: The radix algorithm
* Linked lists using indexes
*
* When we had to sort a large number of punched cards, we put them into
* approximate sequence by sorting on the first few columns with a card
* sorter. Next we put them through a machine called a collator, which
* pulled cards out of order, typically 5-10%. We sorted those cards on
* the full key, then used a collator to merge them into the big deck.
*
* Would you believe the card sorting algorithm, written in Cobol, is
* five times faster than the best high-speed algorithms written in C,
* and about ten times faster than the sorts delivered by Micro Focus?
*
* The algorithm inspects one 'column' at a time, right to left, tossing
* 'cards' into 'pockets' by linking the 'card' to what had previously
* been the last in the 'pocket'. At the end of each pass, it 'gathers'
* the pocket strings into a single string ('deck') by linking the last
* 'card' in a 'pocket' to the first 'card' in the next 'pocket'.
*
* A 'column' can be any convenient size -- one bit, one byte or one
* word. The demo program defines a column as two bytes, 16 bits. This
* requires 65,536 'pockets', of which only a few thousand are actually
* used.
*
* When cards are approximately in order, the program 'merges' the ones
* out of order into place with an Insertion Sort. As a primary
* algorithm, Insertion is slow, close to N*N. But when the entries are
* almost in sequence, it is quite fast. It almost always puts them in
* order with one pass, then makes a second pass to verify that the job
* is complete. It takes a third pass only when there are two consecutive
* steps down. In the 50,000 item test, that doesn't happen, it runs in
* two passes. If radix had sorted on seven bytes rather than six, they
* would have been in perfect order. Insertion would have taken only one
* pass.
*
* The Radix-Insertion algorithm speed is proportional to kN + 2N+, where
* k is the number of columns sorted by Radix (in this case 3) and the +
* following 2N is close-range rearrangements by Insertion. In THEORY,
* the relative speeds of test 3 vs. test 5 should be 50,000 *
* log2(50,000) = 785,000 vs. 3*50,000 + 2*50,000+ =~ 300,000. The ratio
* is 2.6. In PRACTICE, it is 5.0 times as fast. As usual, the devil is
* in the details. If the test program had used subscripts rather than
* indexes, its speed would have been cut in half. Not because indexes
* are that much faster, because every subscript operation generates a
* call to the runtime whereas index operations produce inline code (seen
* via the now commented ASM options).
*
* [If your calculator doesn't have log2, take ln (base e) and divide by .69,
* which is ln(2).]
*
* To compile this "cob -x sorts.cbl". -x produces a Unix executable.
* To execute "sorts".
*
* Continued in trees.cbl
*
* ---------------------------------------------------------------------
$SET SOURCEFORMAT"FREE"
$SET NOBOUND OPT"2" NOTRUNC IBMCOMP NOCHECK
* $SET ASM ASMLIST SOURCEASM
identification division.
program-id. Sorts.
author. Robert Wagner.
* Compare speed of four sorts.
* Results: Sun SPARC, 1.8 GHz, 50,000 30-byte entries, time in
* seconds for 10 iterations.
* --- ratio ---
* 2. 3. 4. 5.
* 0. Indexed file sort 55.4 14.6 22.2 110 139
* 1. Cobol file sort 3.7 1.0 1.5 7.4 9.3
* 2. Micro Focus table sort 3.8 1.5 7.6 9.5
* 3. qsort() 2.5 5.0 6.3
* 4. radix-insertion .5 1.3
* 5. radix to linked list .4
*
* Trees.cbl contains an expanded version of these test results.
working-storage section.
01 test-data GLOBAL.
05 radix-columns comp pic s9(02).
05 test-number comp pic s9(02).
05 card occurs 50001 indexed *> the data
x-n first-n last-n previous-n-0 previous-n-1 previous-n-2 x-last.
10 sort-key.
15 key-column occurs 15 indexed x-c comp-x pic xx.
10 next-n index.
10 next-n-1 index.
01 init-data.
05 init-columns value zero comp pic s9(02).
05 init-number value zero comp pic s9(02).
05 init-card occurs 50001 indexed i-n i-next i-last.
10 init-key-1 pic v9(18).
10 init-key-2 pic v9(12).
10 init-next index.
10 init-next-1 index.
01 timer-variables.
05 repeat-factor value 10 comp pic s9(09).
05 previous-key pic x(30).
05 card-count comp pic s9(09).
05 start-time.
10 start-hours pic 9(02).
10 start-minutes pic 9(02).
10 start-seconds pic 9(02).
10 start-hundredths pic 9(02).
05 end-time.
10 end-hours pic 9(02).
10 end-minutes pic 9(02).
10 end-seconds pic 9(02).
10 end-hundredths pic 9(02).
05 elapsed-time pic z(04).9.
procedure division.
perform construct-data
display '0. Indexed file sort' with no advancing
move 0 to init-number
move 2 to repeat-factor
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'indexed-file-sort'
end-perform
perform timer-off
perform verify-sort
display '1. Cobol file sort' with no advancing
move 1 to init-number
move 10 to repeat-factor
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'cobol-sort'
end-perform
perform timer-off
perform verify-sort
display '2. Micro Focus sort' with no advancing
move 2 to init-number
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'mf-sort'
end-perform
perform timer-off
perform verify-sort
display '3. quick sort N log2(N)' with no advancing
move 3 to init-number
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'quick-sort'
end-perform
perform timer-off
perform verify-sort
display '4. radix-insertion kN + 2N+' with no advancing
move 4 to init-number
move 3 to init-columns
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'radix-sort'
end-perform
perform timer-off
move zero to init-columns
perform verify-sort
display '5. radix-insertion to linked list' with no advancing
move 5 to init-number
move 3 to init-columns
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'radix-sort'
end-perform
perform timer-off
goback
. construct-data.
set x-last, i-last to 50001
perform varying i-n from 1 by 1 until i-n = i-last
compute init-key-1 (i-n) = function random
compute init-key-2 (i-n) = function random
set i-next to i-n
set i-next up by 1
set init-next (i-n) to i-next
set init-next-1 (i-n) to i-last
end-perform
move high-values to init-card (i-last)
. initialize-test-data.
move init-data to test-data
set first-n to 1
. verify-sort.
move zero to previous-key, card-count
set x-n to first-n
if x-n equal to x-last
display 'test failed, no first'
end-if
perform until x-n equal to x-last
if sort-key (x-n) less than previous-key
display 'test failed '
previous-key space sort-key (x-n)
set x-n to x-last
else
move sort-key (x-n) to previous-key
if init-columns not equal to zero
set x-n to next-n (x-n)
else
set x-n up by 1
end-if
add 1 to card-count
end-if
end-perform
if card-count not equal to 50000
display 'test failed, count is ' card-count
end-if
. timer-on.
accept start-time from time
. timer-off.
accept end-time from time
compute elapsed-time rounded =
((((((((end-hours * 60) +
end-minutes) * 60) +
end-seconds) * 100) +
end-hundredths) -
((((((start-hours * 60) +
start-minutes) * 60) +
start-seconds) * 100) +
start-hundredths)) / 100)
* 10 / repeat-factor
display elapsed-time ' seconds'
| |
| PaulWDent 2004-07-31, 10:49 am |
| quote: Originally posted by Robert Wagner
This is a re=posting of the sorts demo with one added test: inserting records
into an indexed file and reading them back in sequence. Jerry Mouse said this
technique is "efficient". In FACT, it ran 15 times slower than a Cobol sort and
more than 100 times slower than a fast sort.
* I'll take Sorts And Trees for a thousand, Alex
*
* These two programs -- Sorts and Trees -- demonstrate Cobol Features
* overlooked by most programmers, while running timing tests on
* sort algorithms.
*
* Glossary.
* We will see three data structures:
*
* Table An array defined with Occurs. Same as a sequential file.
*
* List Each entry has a pointer or index containing the address
* of the next entry, and optionally the previous. Same as
* a 'random' file.
*
* Tree Like a family tree, where each node contains one data
* value and pointers to up to two children, named Left,
* containing entries less than the node's data, and Right.
* Nodes with fewer than two children are called Leaves.
* Same as an indexed file.
*
* Sorts program
*
* Feature: Nested Programs
*
* Cobol is often criticized for having 'an ocean of data' in
* working-storage that is globally accessible to all parts of the
* program. It is also criticized for having overly large monolithic
* programs. Nested programs, seen below, are one answer to that
* objection; we will see another in Trees. They are written like
* external called programs, but they are in the same source file
* with the caller. If they were separately compiled, there would
* be additional administrative work to list them on program
* inventories, promote them through Change Management, perhaps
* list them as dependents in make files.
*
* Nested programs have their own private working-storage and lack
* access to their parent(s)' working-storage unless passed as
* a parameter or declared Global. They have been in the ANSI
* Cobol Standard since 1985.
*
* Test 0. Indexed file sort
*
* Inserts records into an indexed file, then reads them back in sequence.
* This is included as a bad example, the wrong way to sort data.
* It runs 15 times slower than a Cobol sort.
*
* Test 1. Cobol file sort
*
* This is a familiar Cobol sort with input and output procedures,
* release and return records. No Features here, it is shown to provide a
* timing baseline.
*
* Test 2. Micro Focus table sort
*
* Feature: Table sort
*
* Micro Focus provides an extension allowing you sort a table in memory.
* When we time it, we find its speed is almost identical to the file
* sort in test 1, and half the speed it 'should' be as shown in test 3.
* Apparently, under the covers it is releasing records to and returning
* records from the file sort. This was an extension to the 1985 Standard,
* is included in the 2002 Standard.
*
* Test 3. Quick sort
*
* Feature: Calling C language functions from Cobol
*
* This test calls qsort() in the Standard C Library, and directs it to
* call memcmp().
*
* How do you tell Cobol to use the C calling convention? Not necessary,
* Micro Focus uses the C convention by default. How do you tell the
* linker (loader) to look in C libraries? Not necessary. We stopped
* doing static binding (link edit) many years ago. Binding is now done
* at execution time by a program named "ld", which follows
* LD_LIBRARY_PATH. The C Standard Library is in that path by default. In
* short, you don't have to do anything special. You can see libraries
* that will be loaded at execution time by typing "ldd sorts".
*
* C libraries contain a plethora of useful functions. See
* http://www.tcnj.edu/~cs/doc/CStdLib.html
* also
* http://www.bebits.com/app/2917
* also
* a hundred others in /usr/lib, many defined in Standards
*
* The speed of quick sort is said to be proportional to N log2(N), where
* N is the number of entries and log2 is the base 2 logarithm. Most
* high-speed algorithms run at this speed, for example Shell, Heap and
* Comb sorts. Low-speed sorts such as Bubble run at N*N, are suitable
* only for small values of N, say less than 100.
*
* Tests 4 & 5. Radix sort followed by Insertion sort
*
* Features: The radix algorithm
* Linked lists using indexes
*
* When we had to sort a large number of punched cards, we put them into
* approximate sequence by sorting on the first few columns with a card
* sorter. Next we put them through a machine called a collator, which
* pulled cards out of order, typically 5-10%. We sorted those cards on
* the full key, then used a collator to merge them into the big deck.
*
* Would you believe the card sorting algorithm, written in Cobol, is
* five times faster than the best high-speed algorithms written in C,
* and about ten times faster than the sorts delivered by Micro Focus?
*
* The algorithm inspects one 'column' at a time, right to left, tossing
* 'cards' into 'pockets' by linking the 'card' to what had previously
* been the last in the 'pocket'. At the end of each pass, it 'gathers'
* the pocket strings into a single string ('deck') by linking the last
* 'card' in a 'pocket' to the first 'card' in the next 'pocket'.
*
* A 'column' can be any convenient size -- one bit, one byte or one
* word. The demo program defines a column as two bytes, 16 bits. This
* requires 65,536 'pockets', of which only a few thousand are actually
* used.
*
* When cards are approximately in order, the program 'merges' the ones
* out of order into place with an Insertion Sort. As a primary
* algorithm, Insertion is slow, close to N*N. But when the entries are
* almost in sequence, it is quite fast. It almost always puts them in
* order with one pass, then makes a second pass to verify that the job
* is complete. It takes a third pass only when there are two consecutive
* steps down. In the 50,000 item test, that doesn't happen, it runs in
* two passes. If radix had sorted on seven bytes rather than six, they
* would have been in perfect order. Insertion would have taken only one
* pass.
*
* The Radix-Insertion algorithm speed is proportional to kN + 2N+, where
* k is the number of columns sorted by Radix (in this case 3) and the +
* following 2N is close-range rearrangements by Insertion. In THEORY,
* the relative speeds of test 3 vs. test 5 should be 50,000 *
* log2(50,000) = 785,000 vs. 3*50,000 + 2*50,000+ =~ 300,000. The ratio
* is 2.6. In PRACTICE, it is 5.0 times as fast. As usual, the devil is
* in the details. If the test program had used subscripts rather than
* indexes, its speed would have been cut in half. Not because indexes
* are that much faster, because every subscript operation generates a
* call to the runtime whereas index operations produce inline code (seen
* via the now commented ASM options).
*
* [If your calculator doesn't have log2, take ln (base e) and divide by .69,
* which is ln(2).]
*
* To compile this "cob -x sorts.cbl". -x produces a Unix executable.
* To execute "sorts".
*
* Continued in trees.cbl
*
* ---------------------------------------------------------------------
$SET SOURCEFORMAT"FREE"
$SET NOBOUND OPT"2" NOTRUNC IBMCOMP NOCHECK
* $SET ASM ASMLIST SOURCEASM
identification division.
program-id. Sorts.
author. Robert Wagner.
* Compare speed of four sorts.
* Results: Sun SPARC, 1.8 GHz, 50,000 30-byte entries, time in
* seconds for 10 iterations.
* --- ratio ---
* 2. 3. 4. 5.
* 0. Indexed file sort 55.4 14.6 22.2 110 139
* 1. Cobol file sort 3.7 1.0 1.5 7.4 9.3
* 2. Micro Focus table sort 3.8 1.5 7.6 9.5
* 3. qsort() 2.5 5.0 6.3
* 4. radix-insertion .5 1.3
* 5. radix to linked list .4
*
* Trees.cbl contains an expanded version of these test results.
working-storage section.
01 test-data GLOBAL.
05 radix-columns comp pic s9(02).
05 test-number comp pic s9(02).
05 card occurs 50001 indexed *> the data
x-n first-n last-n previous-n-0 previous-n-1 previous-n-2 x-last.
10 sort-key.
15 key-column occurs 15 indexed x-c comp-x pic xx.
10 next-n index.
10 next-n-1 index.
01 init-data.
05 init-columns value zero comp pic s9(02).
05 init-number value zero comp pic s9(02).
05 init-card occurs 50001 indexed i-n i-next i-last.
10 init-key-1 pic v9(18).
10 init-key-2 pic v9(12).
10 init-next index.
10 init-next-1 index.
01 timer-variables.
05 repeat-factor value 10 comp pic s9(09).
05 previous-key pic x(30).
05 card-count comp pic s9(09).
05 start-time.
10 start-hours pic 9(02).
10 start-minutes pic 9(02).
10 start-seconds pic 9(02).
10 start-hundredths pic 9(02).
05 end-time.
10 end-hours pic 9(02).
10 end-minutes pic 9(02).
10 end-seconds pic 9(02).
10 end-hundredths pic 9(02).
05 elapsed-time pic z(04).9.
procedure division.
perform construct-data
display '0. Indexed file sort' with no advancing
move 0 to init-number
move 2 to repeat-factor
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'indexed-file-sort'
end-perform
perform timer-off
perform verify-sort
display '1. Cobol file sort' with no advancing
move 1 to init-number
move 10 to repeat-factor
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'cobol-sort'
end-perform
perform timer-off
perform verify-sort
display '2. Micro Focus sort' with no advancing
move 2 to init-number
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'mf-sort'
end-perform
perform timer-off
perform verify-sort
display '3. quick sort N log2(N)' with no advancing
move 3 to init-number
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'quick-sort'
end-perform
perform timer-off
perform verify-sort
display '4. radix-insertion kN + 2N+' with no advancing
move 4 to init-number
move 3 to init-columns
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'radix-sort'
end-perform
perform timer-off
move zero to init-columns
perform verify-sort
display '5. radix-insertion to linked list' with no advancing
move 5 to init-number
move 3 to init-columns
perform timer-on
perform repeat-factor times
perform initialize-test-data
call 'radix-sort'
end-perform
perform timer-off
goback
.. construct-data.
set x-last, i-last to 50001
perform varying i-n from 1 by 1 until i-n = i-last
compute init-key-1 (i-n) = function random
compute init-key-2 (i-n) = function random
set i-next to i-n
set i-next up by 1
set init-next (i-n) to i-next
set init-next-1 (i-n) to i-last
end-perform
move high-values to init-card (i-last)
.. initialize-test-data.
move init-data to test-data
set first-n to 1
.. verify-sort.
move zero to previous-key, card-count
set x-n to first-n
if x-n equal to x-last
display 'test failed, no first'
end-if
perform until x-n equal to x-last
if sort-key (x-n) less than previous-key
display 'test failed '
previous-key space sort-key (x-n)
set x-n to x-last
else
move sort-key (x-n) to previous-key
if init-columns not equal to zero
set x-n to next-n (x-n)
else
set x-n up by 1
end-if
add 1 to card-count
end-if
end-perform
if card-count not equal to 50000
display 'test failed, count is ' card-count
end-if
.. timer-on.
accept start-time from time
.. timer-off.
accept end-time from time
compute elapsed-time rounded =
((((((((end-hours * 60) +
end-minutes) * 60) +
end-seconds) * 100) +
end-hundredths) -
((((((start-hours * 60) +
start-minutes) * 60) +
start-seconds) * 100) +
start-hundredths)) / 100)
* 10 / repeat-factor
display elapsed-time ' seconds'
|
|
|
|
|