Home > Archive > Cobol > September 2007 > Cobol Myth Busters
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here] Pages: Pages: [1] 2
| Author |
Cobol Myth Busters
|
|
| Robert 2007-08-31, 9:57 pm |
| In the Micro Focus manual Server Express (2.2 & 4.0):Program Development, chapter 1 part 1
is titled Writing Efficient Programs. Its top billing tells us then think speed is a Very
Important Topic we should know about. For fun, I put their advice to the test.
The machine I used is a high-end HP Superdome with 64 PA (RISC) processors. Of course,
the Cobol test program was only using one of them. For general reference, other timing
tests showed mid-range Sun SPARC CPUs to be 3 times faster than the PA, and HP Superdomes
with Itaniums to be 6-10 rimes faster. Despite that, customer demand forced HP to rescind
its decision to obsolete the PA. These tests were run on a 'new generation' PA.
I added a few comparisons that are not from the MF manual, but are widely believed in the
Cobol community. They are styled "Legacy:". Execution times are in microseconds (us), with
a resolution of plus or minus 5. I'll describe the timing methodology toward the end; for
now, take my word that the speeds are accurate.
Proposition: Use simple two-operand arithmetic statements wherever possible.
Test:
05 binary-number binary pic s9(09) sync.
add 1 to binary-number *> 1 us
compute binary-number = binary-number + 1 *> 1 us
add 1 to binary-number
multiply 5 by binary-number
divide 5 into binary-number *> 50 us
compute binary-number = ((binary-number + 1) * 5) / 5 *> 445 us
Finding: busted for simple cases, confirmed for cases with more than one operation.
Proposition: "Do not use the REMAINDER, ROUNDED, ON SIZE ERROR or CORRESPONDING phrases if
you want the fastest performance. No optimization is done on arithmetic statements if the
ON SIZE ERROR phrase is used. For this reason, we recommend you do not use this phrase if
high performance is required. The ROUNDED phrase impacts performance, but it is generally
faster to use ROUNDED than try to round the result using your own routine. "
Test:
compute binary-number rounded = binary-number + 1 *> 1 us (no penalty)
add 1 to binary-number *> 15 us
on size error display 'overflow'
end-add
Finding: busted for rounded, confirmed for size error.
Legacy belief: indexes are faster than subscripts
Test:
05 s-subscript binary pic s9(09) sync.
01 misaligned-area sync.
05 array-element occurs 4096 indexed x-index.
10 misaligned-number comp-5 pic s9(09).
10 to-cause-misalignment pic x(01).
move array-element (s-subscript) to test-byte *> 3 us
move array-element (x-index) to test-byte *> 6 us
Finding: BUSTED. Index is actually slower.
Proposition: When incrementing or decrementing a counter, terminate it with a literal
value rather than a value held in a data item. For example, to execute a loop n times, set
the counter to n and then decrement the counter until it becomes zero, rather than
incrementing the counter from zero to n.
Test:
perform varying binary-number from 10 by -1 until binary-number = 0 *> 150 us
perform varying binary-number from 1 by 1 until binary-number > 10 *> 154 us
Finding: BUSTED
Proposition: Access to tables defined with OCCURS ... DEPENDING is less efficient than
access to tables of fixed size, and so should be avoided where high performance is needed.
Test:
01 depending-area.
05 depending-element occurs 1 to 4096 depending on binary-number.
10 comp-5 pic s9(09).
10 pic x(01).
move array-element (s-subscript) to test-byte *> 3 us
move depending-element (s-subscript) to test-byte *> 3 us
Finding: BUSTED
Proposition: Arithmetic on COMP-3 data items is performed in packed decimal and is much
slower than arithmetic on COMP items. It should be avoided.
Test:
05 display-number pic 9(09).
05 packed-number comp-3 pic s9(09).
add 1 to display-number *> 174 us
add 1 to packed-number *> 160 us
Finding: CONFIRMED. Packed is almost as slow as display. It was fast on 1970-era
mainframes. There is no longer any reason to use it. If you want to save space, look at
space-filled strings and filler-padding.
To be continued with the most unexpected and interesting case: does aligning numbers on
memory boundaries matter?
| |
| Roger While 2007-09-01, 7:57 am |
| Absolute rubbish.
You need to do an inline PERFORM of
at least a million iterations to determine this.
In fact, BINARY (aka COMP) is big endian (generally).
So anyway your tests are invalid.
(They force a endian swap on little endian)
You should be using BINARY-LONG (aka COMP-5)
Alignment DOES matter on machines where
this is not tolerated.
I have done all this machine stuff with OC.
Roger
"Robert" <no@e.mail> schrieb im Newsbeitrag
news:dldhd39vccjgdgs3g1572hbi2eq2suil7u@
4ax.com...
> In the Micro Focus manual Server Express (2.2 & 4.0):Program Development,
> chapter 1 part 1
> is titled Writing Efficient Programs. Its top billing tells us then think
> speed is a Very
> Important Topic we should know about. For fun, I put their advice to the
> test.
>
> The machine I used is a high-end HP Superdome with 64 PA (RISC)
> processors. Of course,
> the Cobol test program was only using one of them. For general reference,
> other timing
> tests showed mid-range Sun SPARC CPUs to be 3 times faster than the PA,
> and HP Superdomes
> with Itaniums to be 6-10 rimes faster. Despite that, customer demand
> forced HP to rescind
> its decision to obsolete the PA. These tests were run on a 'new
> generation' PA.
>
> I added a few comparisons that are not from the MF manual, but are widely
> believed in the
> Cobol community. They are styled "Legacy:". Execution times are in
> microseconds (us), with
> a resolution of plus or minus 5. I'll describe the timing methodology
> toward the end; for
> now, take my word that the speeds are accurate.
>
> Proposition: Use simple two-operand arithmetic statements wherever
> possible.
>
> Test:
> 05 binary-number binary pic s9(09) sync.
>
> add 1 to binary-number *> 1 us
> compute binary-number = binary-number + 1 *> 1 us
>
> add 1 to binary-number
> multiply 5 by binary-number
> divide 5 into binary-number
> *> 50 us
>
> compute binary-number = ((binary-number + 1) * 5) / 5 *> 445 us
>
> Finding: busted for simple cases, confirmed for cases with more than one
> operation.
>
> Proposition: "Do not use the REMAINDER, ROUNDED, ON SIZE ERROR or
> CORRESPONDING phrases if
> you want the fastest performance. No optimization is done on arithmetic
> statements if the
> ON SIZE ERROR phrase is used. For this reason, we recommend you do not use
> this phrase if
> high performance is required. The ROUNDED phrase impacts performance, but
> it is generally
> faster to use ROUNDED than try to round the result using your own routine.
> "
>
> Test:
> compute binary-number rounded = binary-number + 1 *> 1 us (no
> penalty)
> add 1 to binary-number
> *> 15 us
> on size error display 'overflow'
> end-add
>
> Finding: busted for rounded, confirmed for size error.
>
> Legacy belief: indexes are faster than subscripts
>
> Test:
> 05 s-subscript binary pic s9(09) sync.
> 01 misaligned-area sync.
> 05 array-element occurs 4096 indexed x-index.
> 10 misaligned-number comp-5 pic s9(09).
> 10 to-cause-misalignment pic x(01).
> move array-element (s-subscript) to test-byte *> 3 us
> move array-element (x-index) to test-byte *> 6 us
>
> Finding: BUSTED. Index is actually slower.
>
> Proposition: When incrementing or decrementing a counter, terminate it
> with a literal
> value rather than a value held in a data item. For example, to execute a
> loop n times, set
> the counter to n and then decrement the counter until it becomes zero,
> rather than
> incrementing the counter from zero to n.
>
> Test:
> perform varying binary-number from 10 by -1 until binary-number = 0 *>
> 150 us
> perform varying binary-number from 1 by 1 until binary-number > 10 *>
> 154 us
>
> Finding: BUSTED
>
> Proposition: Access to tables defined with OCCURS ... DEPENDING is less
> efficient than
> access to tables of fixed size, and so should be avoided where high
> performance is needed.
>
> Test:
>
> 01 depending-area.
> 05 depending-element occurs 1 to 4096 depending on binary-number.
> 10 comp-5 pic s9(09).
> 10 pic x(01).
> move array-element (s-subscript) to test-byte *> 3 us
> move depending-element (s-subscript) to test-byte *> 3 us
>
> Finding: BUSTED
>
> Proposition: Arithmetic on COMP-3 data items is performed in packed
> decimal and is much
> slower than arithmetic on COMP items. It should be avoided.
>
> Test:
> 05 display-number pic 9(09).
> 05 packed-number comp-3 pic s9(09).
>
> add 1 to display-number *> 174 us
> add 1 to packed-number *> 160 us
>
> Finding: CONFIRMED. Packed is almost as slow as display. It was fast on
> 1970-era
> mainframes. There is no longer any reason to use it. If you want to save
> space, look at
> space-filled strings and filler-padding.
>
> To be continued with the most unexpected and interesting case: does
> aligning numbers on
> memory boundaries matter?
| |
| Robert 2007-09-01, 7:57 am |
| On Sat, 1 Sep 2007 11:56:43 +0200, "Roger While" <simrw@sim-basis.de> wrote:
>Absolute rubbish.
Thanks for the erudite rebuttal.
>You need to do an inline PERFORM of
>at least a million iterations to determine this.
I did 100 million.
>In fact, BINARY (aka COMP) is big endian (generally).
>So anyway your tests are invalid.
>(They force a endian swap on little endian)
The PA processor is little endian, so there's no difference between BINARY and COMP-5.
Try again when someone posts timing tests on an Intel or Alpha.
>You should be using BINARY-LONG (aka COMP-5)
I didn't post a comparison because most people find no difference boring.
>Alignment DOES matter on machines where
>this is not tolerated.
Modern machines have two or three levels of cache between the CPU and memory. There are no
alignment issues in a cache. But compilers that THINK alignment is important shoot
themselves in the foot by generating extra instructions to align he number to speed things
up. The extra instructions are counterproductive, they actually slow things down.
>I have done all this machine stuff with OC.
What's OC?
| |
| Roger While 2007-09-01, 7:57 am |
| "Robert" <no@e.mail> schrieb im Newsbeitrag
news:0ukid3h951nksjv34nttgko2i2k6di7cn5@
4ax.com...
> On Sat, 1 Sep 2007 11:56:43 +0200, "Roger While" <simrw@sim-basis.de>
> wrote:
>
>
> Thanks for the erudite rebuttal.
>
>
> I did 100 million.
Super, post the program.
Do not do calculations in your head:-)
>
>
> The PA processor is little endian, so there's no difference between BINARY
> and COMP-5.
> Try again when someone posts timing tests on an Intel or Alpha.
>
>
> I didn't post a comparison because most people find no difference boring.
>
Really, This IS a major issue when doing big/liittle-endian.
>
> Modern machines have two or three levels of cache between the CPU and
> memory. There are no
> alignment issues in a cache. But compilers that THINK alignment is
> important shoot
> themselves in the foot by generating extra instructions to align he number
> to speed things
> up. The extra instructions are counterproductive, they actually slow
> things down.
>
>
> What's OC?
>
Follow the links here :-)
Roger
| |
| Pete Dashwood 2007-09-01, 6:58 pm |
|
"Robert" <no@e.mail> wrote in message
news:0ukid3h951nksjv34nttgko2i2k6di7cn5@
4ax.com...
> On Sat, 1 Sep 2007 11:56:43 +0200, "Roger While" <simrw@sim-basis.de>
> wrote:
>
>
> Thanks for the erudite rebuttal.
>
>
> I did 100 million.
>
>
> The PA processor is little endian, so there's no difference between BINARY
> and COMP-5.
> Try again when someone posts timing tests on an Intel or Alpha.
>
>
> I didn't post a comparison because most people find no difference boring.
>
>
> Modern machines have two or three levels of cache between the CPU and
> memory. There are no
> alignment issues in a cache. But compilers that THINK alignment is
> important shoot
> themselves in the foot by generating extra instructions to align he number
> to speed things
> up. The extra instructions are counterproductive, they actually slow
> things down.
>
>
> What's OC?
>
It is Open COBOL. Roger is one of the people working on it.
I have no axe to grind, but I did find some of your results eyebrow-raising.
Have you thought carefully about exactly how "unbiased" your tests are?
Pete.
--
"I used to write COBOL...now I can do anything."
| |
| Robert 2007-09-01, 6:58 pm |
| On Sat, 1 Sep 2007 14:40:01 +0200, "Roger While" <simrw@sim-basis.de> wrote:
>"Robert" <no@e.mail> schrieb im Newsbeitrag
> news:0ukid3h951nksjv34nttgko2i2k6di7cn5@
4ax.com...
[color=darkred]
>
>Really, This IS a major issue when doing big/liittle-endian.
Conversion is a MINOR issue. It takes one instruction -- xchg al,ah -- for 16 bit and
three instructions -- xchg ah, al, ror eax, 16, xchg ah, al -- for 32 bit.
How do you handle bi-endian machines such as Itanium and PowerPC? The compiler doesn't
know the machine's state at execution time. A compiler running under Linux thinks the
Itanium is big endian. An LPAR running HP-UX on the same machine sees the world as little
endian. Conversions are handled by an emulator, not the compiler.
[color=darkred]
Things change. The PA alignment instructions speeded things up in the late '80s. Now they
slow things down, especially on machines with an L2 cache such as the PA-7300 and 8800.
As religious wars rage, the poor compiler is forever playing catchup. This is a good
reason to separate code generation from the compiler, as done by GCC and Mercury.
I see OC does that by using the GCC C compiler as its back end. The problem with that
approach is you can't generate inline code for Cobol things that have no corresponding C
syntax. For instance, a SEARCH or STRING looking for a one byte delimiter on Intel SHOULD
generate an inline REPNE SCASB. There's no way to say that in C; you have to call a
function.
| |
| Doug Miller 2007-09-01, 6:58 pm |
| In article <dldhd39vccjgdgs3g1572hbi2eq2suil7u@4ax.com>, Robert <no@e.mail> wrote:
[snip]
>Proposition: Use simple two-operand arithmetic statements wherever possible.
>
>Test:
>05 binary-number binary pic s9(09) sync.
>
>add 1 to binary-number *> 1 us
>compute binary-number = binary-number + 1 *> 1 us
>
>add 1 to binary-number
>multiply 5 by binary-number
>divide 5 into binary-number
> *> 50 us
>
>compute binary-number = ((binary-number + 1) * 5) / 5 *> 445 us
>
>Finding: busted for simple cases, confirmed for cases with more than one
> operation.
Correct interpretation of findings:
Unconfirmed for a single case consisting of a single operation.
Confirmed for a *single*case* (not cases, plural, as incorrectly stated) with
one operation involving integer arithmetic.
The testing conducted was insufficient, in terms both of types and of
numbers of cases, to permit any valid conclusions to be drawn. Further testing
with larger numbers of simple, complex, and intermediate cases involving both
integers and decimal fractions, with varying USAGEs, needed in order to draw
any valid conclusions.
>Proposition: "Do not use the REMAINDER, ROUNDED, ON SIZE ERROR or CORRESPONDING
> phrases if
>you want the fastest performance. No optimization is done on arithmetic statements if the
>ON SIZE ERROR phrase is used. For this reason, we recommend you do not use this phrase if
>high performance is required. The ROUNDED phrase impacts performance, but it is generally
>faster to use ROUNDED than try to round the result using your own routine. "
>
>Test:
>compute binary-number rounded = binary-number + 1 *> 1 us (no penalty)
>add 1 to binary-number *> 15 us
> on size error display 'overflow'
>end-add
>
>Finding: busted for rounded, confirmed for size error.
Correct interpretation of findings:
As with the previous "test", the testing conducted was insufficient to permit
any valid conclusions to be drawn.
Proposition is confirmed with respect to SIZE ERROR in one simple case.
Additional tests needed, using a variety of PICtures and USAGEs, to determine
whether this case is the general rule, or a fortuitous exception.
Valid test needed to determine effect with ROUNDED. _Of_course_ there's no
penalty for using ROUNDED on an *integer* operation. Why would you expect
otherwise, and why would you expect this test to tell you anything at all?
Effects of REMAINDER and CORRESPONDING not tested.
>Legacy belief: indexes are faster than subscripts
>
>Test:
>05 s-subscript binary pic s9(09) sync.
>01 misaligned-area sync.
> 05 array-element occurs 4096 indexed x-index.
> 10 misaligned-number comp-5 pic s9(09).
> 10 to-cause-misalignment pic x(01).
>move array-element (s-subscript) to test-byte *> 3 us
>move array-element (x-index) to test-byte *> 6 us
>
>Finding: BUSTED. Index is actually slower.
Correct interpretation of findings:
Too many variable conditions are present to allow valid conclusions
to be drawn. Additional tests needed to determine outcome, specifically (but
not necessarily limited to):
a) second test should be conducted with properly aligned data items, to
eliminate misalignment as a contributing factor;
b) third test should be conducted with USAGE DISPLAY data items, to eliminate
all alignment issues as contributing factors;
c) fourth test should be conducted using separate arrays for the subscripted
and indexed accesses, to eliminate INDEXED BY in the definition of the array
as a possible factor in speeding up subscripted access.
>
>Proposition: When incrementing or decrementing a counter, terminate it with a literal
>value rather than a value held in a data item. For example, to execute a loop n times, set
>the counter to n and then decrement the counter until it becomes zero, rather than
>incrementing the counter from zero to n.
>
>Test:
>perform varying binary-number from 10 by -1 until binary-number = 0 *> 150 us
>perform varying binary-number from 1 by 1 until binary-number > 10 *> 154 us
>
>Finding: BUSTED
Correct interpretation of findings:
Baloney. The proposition was not tested at all. Each test case compared the
counter to a literal, not to a data item, so it's hardly surprising that the
difference is so small.
>
>Proposition: Access to tables defined with OCCURS ... DEPENDING is less efficient than
>access to tables of fixed size, and so should be avoided where high performance is needed.
>
>Test:
>
>01 depending-area.
> 05 depending-element occurs 1 to 4096 depending on binary-number.
> 10 comp-5 pic s9(09).
> 10 pic x(01).
>move array-element (s-subscript) to test-byte *> 3 us
>move depending-element (s-subscript) to test-byte *> 3 us
>
>Finding: BUSTED
Correct interpretation of findings:
The testing conducted is grossly insufficient to permit any valid conclusions
to be drawn.
The *reporting* of what little testing was done is *also* grossly insufficient
to permit assessing the validity of that minimal testing. Specifically, it is
necessary to see the definitions of array-element, s-subscript, and test-byte.
Additional testing is needed, including (but not necessarily limited to):
a) Examine the results of comparing
OCCURS 10 vs. OCCURS 1 TO 10
OCCURS 100 vs. OCCURS 1 TO 100
OCCURS 1000 vs. OCCURS 1 TO 1000
etc. to determine if array size has any effect. Make sure that at least one
of these tests uses the largest array size permitted by the compiler.
b) Examine the results of comparing OCCURS 1000 vs OCCURS 1 TO 1000 vs OCCURS
500 TO 1000, e.g, and other similar tests, to determine if the *lower* bound
has any effect. Again, make sure that at least one of these tests uses the
largest array size permitted by the compiler.
c) Compare the results of
OCCURS 4000 vs. OCCURS 1 TO 4000
with the results of
OCCURS 4096 vs. OCCURS 1 TO 4096
to eliminate the [admittedly unlikely] possibility that the array size being
an exact power of two has anything to do with the results.
d) Repeat the one test conducted, changing the USAGEs of all data elements to
DISPLAY, to eliminate alignment issues as a contributing factor.
e) Repeat the tests described in a) and b) above, varying the USAGE of the
DEPENDING item to determine what, if any, difference this makes.
>Proposition: Arithmetic on COMP-3 data items is performed in packed decimal and is much
>slower than arithmetic on COMP items. It should be avoided.
>
>Test:
>05 display-number pic 9(09).
>05 packed-number comp-3 pic s9(09).
>
>add 1 to display-number *> 174 us
>add 1 to packed-number *> 160 us
>
>Finding: CONFIRMED. Packed is almost as slow as display. It was fast on 1970-era
>mainframes. There is no longer any reason to use it. If you want to save space, look at
>space-filled strings and filler-padding.
Correct interpretation of finding: Baloney. The proposition was not tested at
all, and no valid finding is possible.
Valid test needs to be conducted, comparing the execution speed of
instructions involving COMP-3 vs COMP data, rather than COMP-3 vs DISPLAY, and
using data items with identical PICtures.
>To be continued with the most unexpected and interesting case: does aligning numbers on
>memory boundaries matter?
Perhaps this time you can manage to devise some valid, *comprehensive* test
cases, conduct them properly, report them completely, and interpret the
results correctly.
--
Regards,
Doug Miller (alphag at milmac dot com)
It's time to throw all their damned tea in the harbor again.
| |
| Alistair 2007-09-01, 6:59 pm |
| I have no wish to criticise your findings but I do have two points to
make:
Robert wrote:
>
> Proposition: Use simple two-operand arithmetic statements wherever possible.
>
> Test:
> 05 binary-number binary pic s9(09) sync.
>
> add 1 to binary-number *> 1 us
> compute binary-number = binary-number + 1 *> 1 us
>
> add 1 to binary-number
> multiply 5 by binary-number
> divide 5 into binary-number *> 50 us
>
> compute binary-number = ((binary-number + 1) * 5) / 5 *> 445 us
>
> Finding: busted for simple cases, confirmed for cases with more than one operation.
This proposition, I believe, derived from the early days when (perhaps
DD can cast his mind back that far and confirm it for us?) the COMPUTE
verb was shown to be less efficient than a multitude of other verbs
that accomplished the same task. Times move on and the COMPUTE verb is
no longer as inefficient as it was once. Personally, I would prefer to
use a complex COMPUTE rather than a series of simple verbs as I
believe that the COMPUTE, because it most closely resembles the
equation being represented, is a better form of self-documentation.
>
> Proposition: "Do not use the REMAINDER, ROUNDED, ON SIZE ERROR or CORRESPONDING phrases if
> you want the fastest performance. No optimization is done on arithmetic statements if the
> ON SIZE ERROR phrase is used. For this reason, we recommend you do not use this phrase if
> high performance is required. The ROUNDED phrase impacts performance, but it is generally
> faster to use ROUNDED than try to round the result using your own routine. "
>
>
> Finding: BUSTED. Index is actually slower.
I'm not really fussed by this but I think that making the algorithm
more obvious has some benefits so I am happy using these clauses (and
would be even where proven to be inefficient).
>
> Proposition: When incrementing or decrementing a counter, terminate it with a literal
> value rather than a value held in a data item. For example, to execute a loop n times, set
> the counter to n and then decrement the counter until it becomes zero, rather than
> incrementing the counter from zero to n.
>
> Test:
> perform varying binary-number from 10 by -1 until binary-number = 0 *> 150 us
> perform varying binary-number from 1 by 1 until binary-number > 10 *> 154 us
>
> Finding: BUSTED
The difference here is going to be so minuscule as to be hardly worth
mentioning. The reason you use a literal is because, if you use a data-
item, the format and location of the data-item has to be recalculated
on each execution of the loop. However, you are talking about a few
cpu cycles per loop. So no real problem. If you are bothered about cpu
cycles, than you should avoid ODO and anything else that requires the
machine to calculate displacements.
>
> Proposition: Access to tables defined with OCCURS ... DEPENDING is less efficient than
> access to tables of fixed size, and so should be avoided where high performance is needed.
>
>
> Finding: BUSTED
I don't think that there is much difference in calculating the
location of a data-item located in an ODO table as compared to a fixed
table size. You would be better off using specific data items (eg
data-1, data-2, data-3....) if those few cpu cycles bother you that
much.
>
> Proposition: Arithmetic on COMP-3 data items is performed in packed decimal and is much
> slower than arithmetic on COMP items. It should be avoided.
>
>
> Finding: CONFIRMED. Packed is almost as slow as display. It was fast on 1970-era
> mainframes. There is no longer any reason to use it. If you want to save space, look at
> space-filled strings and filler-padding.
>
This one I find very interesting, as it is actually machine dependant.
What I mean is that on an IBM mainframe, where PACKED is one of the
machine implementations then PACKED DECIMAL operations will, probably,
be faster than ZONED DECIMAL but not as fast as BINARY operations.
However, where PACKED is not a NATIVE mode then you will find that
the additional requirement to convert between data formats will result
severely impact any operation using the non-native PACKED mode. I can
confirm this because I tried this experiment using BINARY and PACKED
data on Natural programs (Yes, I know it isn't Cobol but....) running
on a dedicated PC and the PACKED code ran slowly. No surprise.
> To be continued with the most unexpected and interesting case: does aligning numbers on
> memory boundaries matter?
YES!!!!! If you use Assembler, IT DOES.
| |
| Alistair 2007-09-01, 6:59 pm |
|
Robert wrote:
> In the Micro Focus manual Server Express (2.2 & 4.0):Program Development, chapter 1 part 1
> is titled Writing Efficient Programs. Its top billing tells us then think speed is a Very
> Important Topic we should know about. For fun, I put their advice to the test.
>
< BIG SNIP >
I should have added that, in my not-very-humble opinion, the machine
cycles used are much less important to me than the ease of maintenance
of the code. It costs pennies (alright, it can cost a fortune if the
code is really shite) to run inefficient code but it costs pounds (or
2*dollars) to maintain it.
Efficient coding should be encouraged and rewarded. Coding according
to outdated and mis-understood standards should be discouraged. In-
line documentation should be encouraged. Those three will save pounds
(or 2*dollars) and headaches.
| |
| Jeff Campbell 2007-09-01, 6:59 pm |
| Robert wrote:
> On Sat, 1 Sep 2007 11:56:43 +0200, "Roger While" <simrw@sim-basis.de> wrote:
>
>
> Thanks for the erudite rebuttal.
>
>
> I did 100 million.
>
>
> The PA processor is little endian, so there's no difference between BINARY and COMP-5.
> Try again when someone posts timing tests on an Intel or Alpha.
Alphas are bi-endian. That is, the chip supports running in either mode.
The DEC OSs running on Alphas, OpenVMS, Tru64 UNIX and the port of WNT4,
run the CPU(s) little endian. The Linux distributions I have used on
Alphas, Red Hat, Debian and SuSE, are also little endian.
If you can post your test code I'll post the results I get on my
PWS 600au running VMS.
>
>
> I didn't post a comparison because most people find no difference boring.
>
>
> Modern machines have two or three levels of cache between the CPU and memory. There are no
> alignment issues in a cache. But compilers that THINK alignment is important shoot
> themselves in the foot by generating extra instructions to align he number to speed things
> up. The extra instructions are counterproductive, they actually slow things down.
>
>
> What's OC?
>
Jeff
----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
| |
| Robert 2007-09-02, 3:55 am |
| On Sat, 01 Sep 2007 16:15:22 -0600, Jeff Campbell <n8wxs@arrl.net> wrote:
>If you can post your test code I'll post the results I get on my
>PWS 600au running VMS.
Here it is:
* ---------------------------------------------------------------------
* Findings
* Aligned 1
* Unaligned 15
* Misaligned (1) 5
* Misaligned (2) 4
* Binary 1
* Linkage 30
* Compute n=n+1 1
* Rounded 1
* size error 18
* Display 174
* Packed 160
* Arithmetic 50
* Compute 445
* Index 6
* Subscript 3
* Depending 3
* Evaluate true 2
* Evaluate expression 3
* Go to depending 7
* Evaluate case 11
* Initialize 346
* Move zeros 339
* Dec to zero 149
* Inc to 10 154
$SET SOURCEFORMAT"FREE"
$SET NOBOUND
$SET OPT"2"
$SET NOTRUNC
$SET IBMCOMP
$SET NOCHECK
$SET ALIGN"8"
identification division.
program-id. Speed1.
author. Robert Wxagner.
data division.
working-storage section.
01 test-data.
05 comp5-number comp-5 pic s9(09) sync.
05 test-byte pic x(01).
05 unaligned-number comp-5 pic s9(09).
05 pic x(03).
05 binary-number binary pic s9(09) sync.
05 display-number pic 9(09).
05 packed-number comp-3 pic s9(09).
05 s-subscript binary pic s9(09) sync.
01 depending-area.
05 depending-element occurs 1 to 4096 depending on binary-number.
10 comp-5 pic s9(09).
10 pic x(01).
01 misaligned-area sync.
05 array-element occurs 4096 indexed x-index.
10 misaligned-number comp-5 pic s9(09).
10 to-cause-misalignment pic x(01).
01 timer-variables.
05 test-name pic x(30).
05 repeat-factor value 100000000 binary pic s9(09).
05 current-date-structure.
10 pic x(08).
10 time-now-hhmmsshh.
15 hours pic 9(02).
15 minutes pic 9(02).
15 seconds pic 9(02).
15 hundredths pic 9(02).
10 pic x(05).
05 time-now pic 9(06)v99.
05 time-start pic 9(06)v99.
05 timer-overhead value zero pic 9(06)v99.
05 elapsed-time pic s9(06)v99.
05 elapsed-time-display.
10 elapsed-time-edited pic z(05).
linkage section.
01 linkage-number binary pic s9(09) sync.
procedure division.
initialize test-data, misaligned-area
move 'Null test' to test-name
perform timer-on
perform timer-on
perform repeat-factor times
exit perform cycle
end-perform
perform timer-off
compute timer-overhead = (time-now - time-start)
move 'Aligned' to test-name
perform timer-on
perform repeat-factor times
add 1 to comp5-number
exit perform cycle
end-perform
perform timer-off
move 'Unaligned' to test-name
perform timer-on
perform repeat-factor times
add 1 to unaligned-number
exit perform cycle
end-perform
perform timer-off
move 'Misaligned (1)' to test-name
move 1 to s-subscript
perform timer-on
perform repeat-factor times
add 1 to misaligned-number (s-subscript)
exit perform cycle
end-perform
perform timer-off
move 'Misaligned (2)' to test-name
*> if this is faster than Unaligned above,
*> compiler generated alignment code is slowing things down
move 2 to s-subscript
perform timer-on
perform repeat-factor times
add 1 to misaligned-number (s-subscript)
exit perform cycle
end-perform
perform timer-off
move 'Binary' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
add 1 to binary-number
exit perform cycle
end-perform
perform timer-off
move 'Linkage' to test-name
set address of linkage-number to address of binary-number
move zero to linkage-number
perform timer-on
perform repeat-factor times
add 1 to linkage-number
exit perform cycle
end-perform
perform timer-off
move 'Compute n=n+1' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
compute binary-number = binary-number + 1
exit perform cycle
end-perform
perform timer-off
move 'Rounded' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
compute binary-number rounded = binary-number + 1
exit perform cycle
end-perform
perform timer-off
move 'size error' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
add 1 to binary-number
on size error display 'overflow'
end-add
exit perform cycle
end-perform
perform timer-off
move 'Display' to test-name
perform timer-on
perform repeat-factor times
*> add 1 to display-number
exit perform cycle
end-perform
perform timer-off
move 'Packed' to test-name
perform timer-on
perform repeat-factor times
*> add 1 to packed-number
exit perform cycle
end-perform
perform timer-off
move 'Arithmetic' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
add 1 to binary-number
multiply 5 by binary-number
divide 5 into binary-number
exit perform cycle
end-perform
perform timer-off
move 'Compute' to test-name
move zero to binary-number
divide 10 into repeat-factor
perform timer-on
perform repeat-factor times
compute binary-number = ((binary-number + 1) * 5) / 5
exit perform cycle
end-perform
perform timer-off
multiply 10 by repeat-factor
move 'Index' to test-name
set x-index to 1000
perform timer-on
perform repeat-factor times
move array-element (x-index) to test-byte
exit perform cycle
end-perform
perform timer-off
move 'Subscript' to test-name
move 1000 to s-subscript
perform timer-on
perform repeat-factor times
move array-element (s-subscript) to test-byte
exit perform cycle
end-perform
perform timer-off
move 'Depending' to test-name
move 2000 to binary-number
move 1000 to s-subscript
perform timer-on
perform repeat-factor times
move depending-element (s-subscript) to test-byte
exit perform cycle
end-perform
perform timer-off
move 'Evaluate true' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
evaluate true
when binary-number equal to zero
exit perform cycle
when other
display 'error'
end-evaluate
end-perform
perform timer-off
move 'Evaluate expression' to test-name
move zero to binary-number
perform timer-on
perform repeat-factor times
evaluate binary-number
when zero
exit perform cycle
when other
display 'error'
end-evaluate
end-perform
perform timer-off
move 'Go to depending' to test-name
move 2 to binary-number
perform timer-on
perform go-depending-test repeat-factor times
perform timer-off
move 'Evalaute case' to test-name
move 2 to binary-number
perform timer-on
perform evaluate-case-test repeat-factor times
perform timer-off
move 'Initialize' to test-name
perform timer-on
perform repeat-factor times
initialize test-data
exit perform cycle
end-perform
perform timer-off
move 'Move zeros' to test-name
perform timer-on
perform repeat-factor times
move zeros to
comp5-number
test-byte
unaligned-number
binary-number
display-number
packed-number
s-subscript
exit perform cycle
end-perform
perform timer-off
move 'Dec to zero' to test-name
perform timer-on
perform repeat-factor times
perform varying binary-number from 10 by -1 until binary-number
= 0
end-perform
exit perform cycle
end-perform
perform timer-off
move 'Inc to 10' to test-name
perform timer-on
perform repeat-factor times
perform varying binary-number from 1 by 1 until binary-number >
10
end-perform
exit perform cycle
end-perform
perform timer-off
goback
. go-depending-test section.
go to p1 p2 p3 depending on binary-number
display 'error'
. p1. display 'error'
. p2. exit section
. p3. display 'error'
. evaluate-case-test section.
evaluate binary-number
when 1
display 'error'
when 2
exit section
when other
display 'error'
end-evaluate
. end-of-previous section
. timer-on.
perform read-the-time
move time-now to time-start
. timer-off.
perform read-the-time
compute elapsed-time rounded = ((time-now - time-start)
* 100000000 / repeat-factor)
- timer-overhead
if elapsed-time not greater than zero
move 'error' to elapsed-time-display
else
compute elapsed-time-edited rounded = elapsed-time * 10
end-if
display test-name elapsed-time-display
. read-the-time.
accept time-now-hhmmsshh from time
*> move function current-date to current-date-structure
compute time-now =
((((hours * 60) +
minutes) * 60) +
seconds) +
(hundredths / 100)
| |
| Alistair 2007-09-02, 7:56 am |
| On 2 Sep, 02:32, docdw...@panix.com () wrote:
> In article <1188684272.693113.252...@22g2000hsm.googlegroups.com>,
>
> Alistair <alist...@ld50macca.demon.co.uk> wrote:
>
>
> [snip]
>
>
>
> Mr Maclean, I recall being taught something like that lo, those many moons
> ago... but I never tested it and I don't recall ever seeing a PMAP where a
> COMPUTE was shown to be of lesser efficiency than simpler instructions.
>
> My experiences are limited, of course, and my memory is, admittedly,
> porous.
>
Just like Pete's toolbox?
| |
| Alistair 2007-09-02, 7:56 am |
| On 2 Sep, 02:40, "Charles Hottel" <chot...@earthlink.net> wrote:
> "Alistair" <alist...@ld50macca.demon.co.uk> wrote in message
>
> news:1188684627.669404.9690@k79g2000hse.googlegroups.com...
>
>
>
> <snip>
>
>
> <snip>
>
> Which meaning of the word are you using?
> The word shite may refer to various things:
>
> a.. A variant of the word shit
> b.. A shi'ite, a person who practices the Shi'a Islam faith
> c.. The shite, the principal character in a Japanese Noh play
> d.. Shite, the person who performs the technique in aikido
The poo related version.
| |
| Alistair 2007-09-02, 7:56 am |
| On 2 Sep, 03:16, Robert <n...@e.mail> wrote:
> On Sat, 01 Sep 2007 15:04:32 -0700, Alistair <alist...@ld50macca.demon.co.uk> wrote:
>
>
>
>
>
>
>
>
>
> That's what I thought. COMPUTE *is* efficient on most compilers.
>
>
> I agree.
>
>
> There you go, repeating a myth about ODO being slow.
>
No myth. It takes more cpu cycles to calculate and then use a
displacement for an ODO data-item referenced by subscript or index
than it does to refer to a fixed-position data-item.
>
>
> Not any longer. It used to be before memory caches. It still is if the compiler generates
> extra instructions intended to save time. They you have to find ways to blind the compiler
> so it will stop.- Hide quoted text -
>
> - Show quoted text -
| |
| Alistair 2007-09-02, 7:56 am |
| On 2 Sep, 03:52, Robert <n...@e.mail> wrote:
> On Sat, 01 Sep 2007 15:10:27 -0700, Alistair <alist...@ld50macca.demon.co.uk> wrote:
>
>
>
>
> According to Pete Dashwood, program maintenance is obsolete.
Nice observation. I surrender.
>
>
> You wouldn't say that if your program was running 20,000 transactions PER SECOND, in real
> time.
>
I worked in a shop where the original programmer had written a program
where the code obfuscated the function. After a morning's
consideration, a colleague re-wrote the code and cut the run-time down
from 100 cpu seconds to 2 cpu seconds. In the same shop, programs
which ran several times daily and processed hundreds of thousands of
records each day were re-written and saved 75% of the cpu time in the
process.
>
> You'll never make it in the world of contract programming.
>
I did make it as a contractor. I followed in-house standards and in
one shop, re-wrote them. I pride myself on trying to make the new code
blend in to the program.
> Standards are a misnomer because they are different in every company and even between
> departments within the same company. De facto standards are usually different from the
> published ones. Their purpose is not to simplify maintenance (the people who wrote the
> standard no longer do maintenance), it's to keep out competition from programmers who are
> better than management. One team lead had the candor to say "Management thinks time
> stopped in 1974. They'd have a stroke if they saw this EXIT PERFORM CYCLE. You can't do
> that! It's not in the standard because they never saw it."
>
> I walked out of that place after one w . It's pretty typical of non-mainframe Cobol
> shops (mainframe shops are ALL like that). The ones that aren't like that say "Yeah, we
> maintain it. When we have to add significant code, we rewrite the thing in C."
Interestingly, the reason for each standard dictat is rarely
documented so old dictats are improperly retained when the compiler
moves on (see the thou shalt not use COMPUTE debate).
| |
| Pete Dashwood 2007-09-02, 7:56 am |
|
"Alistair" <alistair@ld50macca.demon.co.uk> wrote in message
news:1188729683.093849.310010@y42g2000hsy.googlegroups.com...
> On 2 Sep, 02:32, docdw...@panix.com () wrote:
>
> Just like Pete's toolbox?
Alistair, now you are simply .
My toolbox is far from porous... :-)
Pete.
--
"I used to write COBOL...now I can do anything."
| |
| Charles Hottel 2007-09-02, 7:56 am |
|
"Alistair" <alistair@ld50macca.demon.co.uk> wrote in message
news:1188729752.179929.179940@50g2000hsm.googlegroups.com...
> On 2 Sep, 02:40, "Charles Hottel" <chot...@earthlink.net> wrote:
>
> The poo related version.
>
Good, I first thought you meant religious COBOL. Actually I did not
recognite that it was a proper, though ambiguous word.
| |
| Robert 2007-09-02, 6:56 pm |
| Here's the alignment test:
05 comp5-number comp-5 pic s9(09) sync.
05 test-byte pic x(01).
05 unaligned-number comp-5 pic s9(09).
add 1 to comp5-number *> time - 1
add 1 to unaligned-number *> time - 15
Wow, it's hard to believe an extra memory cycle makes it run 15 times slower. At worst, it
should be 4 times slower -- 2x for the load and 2x for the store. It appears the compiler
is generating extra code for the unaligned case. Let's blind the compiler so it can't
tell.
01 misaligned-area sync.
05 array-element occurs 4096 indexed x-index.
10 misaligned-number comp-5 pic s9(09).
10 to-cause-misalignment pic x(01).
move 1 to s-subscript
add 1 to misaligned-number (s-subscript) *> time - 5
move 2 to s-subscript
add 1 to misaligned-number (s-subscript) *> time - 4
Times are almost the same. We know from another test that the subscript costs time 2, so
the add times are 3 and 2. The second case here is identical to the second case above.
Eliminating the extra code made it run 7 times faster. Let's verify that.
add 1 to misaligned-number (1) *> time - 1
add 1 to misaligned-number (2) *> time - 15
Results are the same as the first pair above. When the compiler KNOWS whether the word is
aligned or not, it makes the unaligned case 7 times slower than when it DOESN'T know.
This makes the alignment myth a self-fulfilling prophecy. Alignment is important, not
because the machine cares (any longer) but because the compiler mistakenly THINKS it
matters.
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 03:46:28 -0700, Alistair <alistair@ld50macca.demon.co.uk> wrote:
>
>No myth. It takes more cpu cycles to calculate and then use a
>displacement for an ODO data-item referenced by subscript or index
>than it does to refer to a fixed-position data-item.
Computing the offset of a subscript is exactly the same, whether the table has ODO or not.
I tested that and posted the times.
What you say would be true for items FOLLOWING the ODO, but Cobol doesn't allow you to do
that.
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 03:53:49 -0700, Alistair <alistair@ld50macca.demon.co.uk> wrote:
>Interestingly, the reason for each standard dictat is rarely
>documented so old dictats are improperly retained when the compiler
>moves on (see the thou shalt not use COMPUTE debate).
Or the one about numbering paragraphs, so you can find them in a 200 page listing. That
should have been dropped when we started using text editors.
| |
|
| In article <1188729683.093849.310010@y42g2000hsy.googlegroups.com>,
Alistair <alistair@ld50macca.demon.co.uk> wrote:
>On 2 Sep, 02:32, docdw...@panix.com () wrote:
[snip]
>
>Just like Pete's toolbox?
I'm completely unfamiliar with what the box contains, Mr Maclean, let
alone the container's quality... but I'm sure that Mr Dashwood would
consider assuring you that both are not porous...
.... their quality is the fines'.
(note to non-native English speakers: 'porous', in some dialects of
English, is almost homonymous to 'poorest'; likewise, in some dialects,
the final 't' of some words is dropped.)
DD
| |
|
| In article <2d7kd39f1h14n43q59nse65paghisgftgp@4ax.com>,
Robert <no@e.mail> wrote:
[snip]
>One team lead had the candor to say "Management
>thinks time
>stopped in 1974. They'd have a stroke if they saw this EXIT PERFORM
>CYCLE. You can't do
>that! It's not in the standard because they never saw it."
A similar situation was described to this newsgroup a mere
eight-and-a-half years ago or so:
<http://groups.google.com/group/comp...7?output=gplain>
Search for 'bedrool' (a mis-typing of 'bedroll') (no ').
DD
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 03:53:49 -0700, Alistair <alistair@ld50macca.demon.co.uk> wrote:
> I followed in-house standards and in one shop, re-wrote them.
Standards I've written told people what TO do, rather than what NOT to do.
>I pride myself on trying to make the new code blend in to the program.
I write the code well, get it working, finally edit it to follow standards.
| |
| Michael Mattias 2007-09-02, 6:56 pm |
| "Robert" <no@e.mail> wrote in message
news:rneld31af20op9hl2qrv04q2l735omnul0@
4ax.com...
>
> Wow, it's hard to believe an extra memory cycle makes it run 15 times
> slower. At worst, it
> should be 4 times slower -- 2x for the load and 2x for the store. It
> appears the compiler
> is generating extra code for the unaligned case. Let's blind the compiler
> so it can't
> tell.
This is why some compilers are better - and more expensive - than others.
They transparently handle things like alignment, or using a 'decrement'
instead of an ' increment' to control loop counters, or combine common
literals dispersed throughout the source code into a non-redundant literal
pool.
This is why COBOL, FORTRAN and BASIC compilers are called 'high level"
langauge products. You - the applications programmer - tell the compiler
what you want to happen, and the compiler assumes responsibility for making
it happen efficiently when it tells the hardware what to do.
MCM
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 03:00:38 GMT, spambait@milmac.com (Doug Miller) wrote:
>If you did it only for fun, why publish the results here? Especially, why
>label a proposition "BUSTED" when you've conducted only one incomplete test on
>it?
Because the manual contains bad advice. It even says:
-- quotation --
Other suggestions (to help prevent inefficient coding)
* REMOVE "ROUNDED"
* REMOVE "ERROR"
* REMOVE "INITIALIZE"
* REMOVE "CORRESPONDING"
* REMOVE "THRU"
* REMOVE "THROUGH"
By removing these reserved words you prevent the possibility that code using these
inefficient constructs will be added to the program.
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 03:46:28 -0700, Alistair <alistair@ld50macca.demon.co.uk> wrote:
>
>No myth. It takes more cpu cycles to calculate and then use a
>displacement for an ODO data-item referenced by subscript or index
>than it does to refer to a fixed-position data-item.
A good use for ODO is on tables that will be SEARCHed ALL. With ODO, the search will take
log2(n). Without ODO, padded with high values, the search will take log2(max). On average,
assuming the table is half full, the ODO search will run 10% faster.
Most programmers use the slower method because they believe the myth that ODO is slow.
| |
| Doug Miller 2007-09-02, 6:56 pm |
| In article <rneld31af20op9hl2qrv04q2l735omnul0@4ax.com>, Robert <no@e.mail> wrote:
>01 misaligned-area sync.
> 05 array-element occurs 4096 indexed x-index.
> 10 misaligned-number comp-5 pic s9(09).
> 10 to-cause-misalignment pic x(01).
>
And have you examined a load map to see what the addresses of these items are,
specifically 'misaligned-number(2)' ?
I may be mistaken... but it is my belief that the compiler will *force*
alignment by emitting a slack byte, making the length of 'array-element' one
byte longer than you expect ...
>move 1 to s-subscript
>add 1 to misaligned-number (s-subscript) *> time - 5
>move 2 to s-subscript
>add 1 to misaligned-number (s-subscript) *> time - 4
>
... with this entirely predicable result:
>Times are almost the same.
--
Regards,
Doug Miller (alphag at milmac dot com)
It's time to throw all their damned tea in the harbor again.
| |
| Doug Miller 2007-09-02, 6:56 pm |
| In article <0jjld3hiedrd5ifsjo0gfkja96go48uic7@4ax.com>, Robert <no@e.mail> wrote:
>On Sun, 02 Sep 2007 03:00:38 GMT, spambait@milmac.com (Doug Miller) wrote:
>
>
>
>Because the manual contains bad advice.
In my opinion, you have not yet adequately demonstrated that the advice was
bad.
--
Regards,
Doug Miller (alphag at milmac dot com)
It's time to throw all their damned tea in the harbor again.
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 18:44:23 GMT, spambait@milmac.com (Doug Miller) wrote:
>In article <rneld31af20op9hl2qrv04q2l735omnul0@4ax.com>, Robert <no@e.mail> wrote:
>
>And have you examined a load map to see what the addresses of these items are,
>specifically 'misaligned-number(2)' ?
>
>I may be mistaken... but it is my belief that the compiler will *force*
>alignment by emitting a slack byte, making the length of 'array-element' one
>byte longer than you expect ...
Whoops. The manual says "If the SYNCHRONIZED clause is specified with a non-elementary
item, then the clause applies to all the items subordinate to that non-elementary item."
I need to rerun the test without SYNC on the 01 level.
| |
| Alistair 2007-09-02, 6:56 pm |
|
Charles Hottel wrote:
> "Alistair" <alistair@ld50macca.demon.co.uk> wrote in message
> news:1188729752.179929.179940@50g2000hsm.googlegroups.com...
> Good, I first thought you meant religious COBOL. Actually I did not
> recognite that it was a proper, though ambiguous word.
When I firt came across the word shi'ite I was mildly amused in
observing the closeness to shite in spelling. I have avoided being
crass enough to insult shi'ite muslims by refering to them in the
shorter form. I think such an insult is beneath me.
However, judging by the posts to this group some people take their
variant of Cobol very religiously.
| |
| Doug Miller 2007-09-02, 6:56 pm |
| In article <0b2md31ngkobu211m002lco8krt2ufdusm@4ax.com>, Robert <no@e.mail> wrote:
>On Sun, 02 Sep 2007 18:44:23 GMT, spambait@milmac.com (Doug Miller) wrote:
>
> wrote:
>
>
>Whoops. The manual says "If the SYNCHRONIZED clause is specified with a
> non-elementary
>item, then the clause applies to all the items subordinate to that
> non-elementary item."
>I need to rerun the test without SYNC on the 01 level.
>
Yes, *and* verify by examination of a load map that the COMP item at the 10
level is in fact misaligned, too.
--
Regards,
Doug Miller (alphag at milmac dot com)
It's time to throw all their damned tea in the harbor again.
| |
| Alistair 2007-09-02, 6:56 pm |
|
Robert wrote:
> Here's the alignment test:
>
>
> This makes the alignment myth a self-fulfilling prophecy. Alignment is important, not
> because the machine cares (any longer) but because the compiler mistakenly THINKS it
> matters.
Alignment on what hardware? a pc? I don't recall what reason alignment
existed on mainframes but as far as I recall the use of aligned data
only resulted in undocumented fillers between data items to nudge
aligned data up to the boundaries. It may be that, if you are running
on a pc, the extra overhead with aligned data is an artefact that
would not appear on a mainframe where alignment is/was important.
| |
| Alistair 2007-09-02, 6:56 pm |
|
Robert wrote:
> On Sun, 02 Sep 2007 03:46:28 -0700, Alistair <alistair@ld50macca.demon.co.uk> wrote:
>
>
> Computing the offset of a subscript is exactly the same, whether the table has ODO or not.
It is the data-item that has its' displacement calculated, not the
subscript.
> I tested that and posted the times.
>
> What you say would be true for items FOLLOWING the ODO, but Cobol doesn't allow you to do
> that.
If what you say is true (and I call into question the veracity of it)
then it would be more efficient for us to code all records as ODO and
not as fixed position. But we don't. If ODO is as efficient on your
computer as you make out then I would question the competency of your
compiler.
| |
| Robert 2007-09-02, 6:56 pm |
| On Sun, 02 Sep 2007 13:05:23 -0700, Alistair <alistair@ld50macca.demon.co.uk> wrote:
>
>Robert wrote:
>
>
>It is the data-item that has its' displacement calculated, not the
>subscript.
You are right.
>
>If what you say is true (and I call into question the veracity of it)
>then it would be more efficient for us to code all records as ODO and
>not as fixed position.
It would be. Fixed length records are not used outside the mainframe world. The norm is a
delimiter at the end rather than a length at the front (record sequential), so ODO is not
useful in that case.
> But we don't. If ODO is as efficient on your
>computer as you make out then I would question the competency of your
>compiler.
Here's a challenge: post a program that demonstrates the slowness of ODO.
| |
| William M. Klein 2007-09-03, 3:55 am |
| "Robert" <no@e.mail> wrote in message
news:uchld3petcfs3qe7jn5joglhrgo67alpjp@
4ax.com...
> On Sun, 02 Sep 2007 03:46:28 -0700, Alistair <alistair@ld50macca.demon.co.uk>
> wrote:
>
<snip>
> Computing the offset of a subscript is exactly the same, whether the table has
> ODO or not.
> I tested that and posted the times.
>
> What you say would be true for items FOLLOWING the ODO, but Cobol doesn't
> allow you to do
> that.
>
Micro Focus - the ONLY compiler that you claim to be testing - does allow it.
Furthermore, the timing will be different depending on whether you use ODOSLIDE
or NOODOSLIDE.
P.S. As you are using NOTRUNC as your compiler option, you can't even claim to
be testing for "conforming" COBOL source code. (I suspect several of your
results MIGHT be different with TRUNC - per standard COBOL - and NO-IBM-COMP
might also make a difference).
| |
| Robert 2007-09-03, 3:55 am |
| On Mon, 03 Sep 2007 04:02:25 GMT, "William M. Klein" <wmklein@nospam.netcom.com> wrote:
>"Robert" <no@e.mail> wrote in message
> news:uchld3petcfs3qe7jn5joglhrgo67alpjp@
4ax.com...
><snip>
>
>Micro Focus - the ONLY compiler that you claim to be testing - does allow it.
>Furthermore, the timing will be different depending on whether you use ODOSLIDE
>or NOODOSLIDE.
I've seen that done in PL/I, but NEVER in Cobol.
>P.S. As you are using NOTRUNC as your compiler option, you can't even claim to
>be testing for "conforming" COBOL source code. (I suspect several of your
>results MIGHT be different with TRUNC - per standard COBOL - and NO-IBM-COMP
>might also make a difference).
For this speed test, I tried to eliminate overhead unrelated to the test topic. NOTRUNC
makes "add 1" a pure machine language ADD (or INC), without the overhead of testing for
decimal overflow.
NOIBMCOMP would cause SYNC to be ignored. I wouldn't be able to test alignment.
| |
| Clark F Morris 2007-09-03, 6:55 pm |
| On Fri, 31 Aug 2007 21:22:35 -0500, Robert <no@e.mail> wrote:
>In the Micro Focus manual Server Express (2.2 & 4.0):Program Development, chapter 1 part 1
>is titled Writing Efficient Programs. Its top billing tells us then think speed is a Very
>Important Topic we should know about. For fun, I put their advice to the test.
>
>The machine I used is a high-end HP Superdome with 64 PA (RISC) processors. Of course,
>the Cobol test program was only using one of them. For general reference, other timing
>tests showed mid-range Sun SPARC CPUs to be 3 times faster than the PA, and HP Superdomes
>with Itaniums to be 6-10 rimes faster. Despite that, customer demand forced HP to rescind
>its decision to obsolete the PA. These tests were run on a 'new generation' PA.
>
>I added a few comparisons that are not from the MF manual, but are widely believed in the
>Cobol community. They are styled "Legacy:". Execution times are in microseconds (us), with
>a resolution of plus or minus 5. I'll describe the timing methodology toward the end; for
>now, take my word that the speeds are accurate.
>
>Proposition: Use simple two-operand arithmetic statements wherever possible.
>
>Test:
>05 binary-number binary pic s9(09) sync.
>
>add 1 to binary-number *> 1 us
>compute binary-number = binary-number + 1 *> 1 us
>
>add 1 to binary-number
>multiply 5 by binary-number
>divide 5 into binary-number *> 50 us
>
>compute binary-number = ((binary-number + 1) * 5) / 5 *> 445 us
>
>Finding: busted for simple cases, confirmed for cases with more than one operation.
>
>Proposition: "Do not use the REMAINDER, ROUNDED, ON SIZE ERROR or CORRESPONDING phrases if
>you want the fastest performance. No optimization is done on arithmetic statements if the
>ON SIZE ERROR phrase is used. For this reason, we recommend you do not use this phrase if
>high performance is required. The ROUNDED phrase impacts performance, but it is generally
>faster to use ROUNDED than try to round the result using your own routine. "
>
>Test:
>compute binary-number rounded = binary-number + 1 *> 1 us (no penalty)
>add 1 to binary-number *> 15 us
> on size error display 'overflow'
>end-add
>
>Finding: busted for rounded, confirmed for size error.
>
>Legacy belief: indexes are faster than subscripts
>
>Test:
>05 s-subscript binary pic s9(09) sync.
>01 misaligned-area sync.
> 05 array-element occurs 4096 indexed x-index.
> 10 misaligned-number comp-5 pic s9(09).
> 10 to-cause-misalignment pic x(01).
>move array-element (s-subscript) to test-byte *> 3 us
>move array-element (x-index) to test-byte *> 6 us
>
>Finding: BUSTED. Index is actually slower.
>
>Proposition: When incrementing or decrementing a counter, terminate it with a literal
>value rather than a value held in a data item. For example, to execute a loop n times, set
>the counter to n and then decrement the counter until it becomes zero, rather than
>incrementing the counter from zero to n.
>
>Test:
>perform varying binary-number from 10 by -1 until binary-number = 0 *> 150 us
>perform varying binary-number from 1 by 1 until binary-number > 10 *> 154 us
>
>Finding: BUSTED
>
>Proposition: Access to tables defined with OCCURS ... DEPENDING is less efficient than
>access to tables of fixed size, and so should be avoided where high performance is needed.
>
>Test:
>
>01 depending-area.
> 05 depending-element occurs 1 to 4096 depending on binary-number.
> 10 comp-5 pic s9(09).
> 10 pic x(01).
>move array-element (s-subscript) to test-byte *> 3 us
>move depending-element (s-subscript) to test-byte *> 3 us
>
>Finding: BUSTED
>
>Proposition: Arithmetic on COMP-3 data items is performed in packed decimal and is much
>slower than arithmetic on COMP items. It should be avoided.
>
>Test:
>05 display-number pic 9(09).
>05 packed-number comp-3 pic s9(09).
>
>add 1 to display-number *> 174 us
>add 1 to packed-number *> 160 us
>
>Finding: CONFIRMED. Packed is almost as slow as display. It was fast on 1970-era
>mainframes. There is no longer any reason to use it. If you want to save space, look at
>space-filled strings and filler-padding.
There may be other good reasons to go to display but if you are using
a z series computer (latest evolution of the IBM 360), packed decimal
is still faster than display. Most results depend on the computer
architecture. I suspect in answer to your next question that
alignment still matters on some currently sold computers with an
architecture different from the ones tested on.
>
>To be continued with the most unexpected and interesting case: does aligning numbers on
>memory boundaries matter?
| |
| Jeff Campbell 2007-09-03, 9:55 pm |
| Robert wrote:
> On Sat, 01 Sep 2007 16:15:22 -0600, Jeff Campbell <n8wxs@arrl.net> wrote:
>
>
>
> Here it is:
>
>
> * ---------------------------------------------------------------------
> * Findings
> * Aligned 1
> * Unaligned 15
> * Misaligned (1) 5
> * Misaligned (2) 4
> * Binary 1
> * Linkage 30
> * Compute n=n+1 1
> * Rounded 1
> * size error 18
> * Display 174
> * Packed 160
> * Arithmetic 50
> * Compute 445
> * Index 6
> * Subscript 3
> * Depending 3
> * Evaluate true 2
> * Evaluate expression 3
> * Go to depending 7
> * Evaluate case 11
> * Initialize 346
> * Move zeros 339
> * Dec to zero 149
> * Inc to 10 154
>
> $SET SOURCEFORMAT"FREE"
> $SET NOBOUND
> $SET OPT"2"
> $SET NOTRUNC
> $SET IBMCOMP
> $SET NOCHECK
> $SET ALIGN"8"
> identification division.
> program-id. Speed1.
> author. Robert Wxagner.
>
> data division.
> working-storage section.
> 01 test-data.
> 05 comp5-number comp-5 pic s9(09) sync.
> 05 test-byte pic x(01).
> 05 unaligned-number comp-5 pic s9(09).
> 05 pic x(03).
> 05 binary-number binary pic s9(09) sync.
> 05 display-number pic 9(09).
> 05 packed-number comp-3 pic s9(09).
> 05 s-subscript binary pic s9(09) sync.
>
> 01 depending-area.
> 05 depending-element occurs 1 to 4096 depending on binary-number.
> 10 comp-5 pic s9(09).
> 10 pic x(01).
> 01 misaligned-area sync.
> 05 array-element occurs 4096 indexed x-index.
> 10 misaligned-number comp-5 pic s9(09).
> 10 to-cause-misalignment pic x(01).
>
> 01 timer-variables.
> 05 test-name pic x(30).
> 05 repeat-factor value 100000000 binary pic s9(09).
> 05 current-date-structure.
> 10 pic x(08).
> 10 time-now-hhmmsshh.
> 15 hours pic 9(02).
> 15 minutes pic 9(02).
> 15 seconds pic 9(02).
> 15 hundredths pic 9(02).
> 10 pic x(05).
> 05 time-now pic 9(06)v99.
> 05 time-start pic 9(06)v99.
> 05 timer-overhead value zero pic 9(06)v99.
> 05 elapsed-time pic s9(06)v99.
> 05 elapsed-time-display.
> 10 elapsed-time-edited pic z(05).
>
> linkage section.
> 01 linkage-number binary pic s9(09) sync.
>
> procedure division.
>
> initialize test-data, misaligned-area
>
> move 'Null test' to test-name
> perform timer-on
> perform timer-on
> perform repeat-factor times
> exit perform cycle
> end-perform
> perform timer-off
> compute timer-overhead = (time-now - time-start)
>
> move 'Aligned' to test-name
> perform timer-on
> perform repeat-factor times
> add 1 to comp5-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Unaligned' to test-name
> perform timer-on
> perform repeat-factor times
> add 1 to unaligned-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Misaligned (1)' to test-name
> move 1 to s-subscript
> perform timer-on
> perform repeat-factor times
> add 1 to misaligned-number (s-subscript)
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Misaligned (2)' to test-name
> *> if this is faster than Unaligned above,
> *> compiler generated alignment code is slowing things down
> move 2 to s-subscript
> perform timer-on
> perform repeat-factor times
> add 1 to misaligned-number (s-subscript)
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Binary' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> add 1 to binary-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Linkage' to test-name
> set address of linkage-number to address of binary-number
> move zero to linkage-number
> perform timer-on
> perform repeat-factor times
> add 1 to linkage-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Compute n=n+1' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> compute binary-number = binary-number + 1
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Rounded' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> compute binary-number rounded = binary-number + 1
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'size error' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> add 1 to binary-number
> on size error display 'overflow'
> end-add
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Display' to test-name
> perform timer-on
> perform repeat-factor times
> *> add 1 to display-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Packed' to test-name
> perform timer-on
> perform repeat-factor times
> *> add 1 to packed-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Arithmetic' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> add 1 to binary-number
> multiply 5 by binary-number
> divide 5 into binary-number
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Compute' to test-name
> move zero to binary-number
> divide 10 into repeat-factor
> perform timer-on
> perform repeat-factor times
> compute binary-number = ((binary-number + 1) * 5) / 5
> exit perform cycle
> end-perform
> perform timer-off
> multiply 10 by repeat-factor
>
> move 'Index' to test-name
> set x-index to 1000
> perform timer-on
> perform repeat-factor times
> move array-element (x-index) to test-byte
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Subscript' to test-name
> move 1000 to s-subscript
> perform timer-on
> perform repeat-factor times
> move array-element (s-subscript) to test-byte
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Depending' to test-name
> move 2000 to binary-number
> move 1000 to s-subscript
> perform timer-on
> perform repeat-factor times
> move depending-element (s-subscript) to test-byte
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Evaluate true' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> evaluate true
> when binary-number equal to zero
> exit perform cycle
> when other
> display 'error'
> end-evaluate
> end-perform
> perform timer-off
>
> move 'Evaluate expression' to test-name
> move zero to binary-number
> perform timer-on
> perform repeat-factor times
> evaluate binary-number
> when zero
> exit perform cycle
> when other
> display 'error'
> end-evaluate
> end-perform
> perform timer-off
>
> move 'Go to depending' to test-name
> move 2 to binary-number
> perform timer-on
> perform go-depending-test repeat-factor times
> perform timer-off
>
> move 'Evalaute case' to test-name
> move 2 to binary-number
> perform timer-on
> perform evaluate-case-test repeat-factor times
> perform timer-off
>
> move 'Initialize' to test-name
> perform timer-on
> perform repeat-factor times
> initialize test-data
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Move zeros' to test-name
> perform timer-on
> perform repeat-factor times
> move zeros to
> comp5-number
> test-byte
> unaligned-number
> binary-number
> display-number
> packed-number
> s-subscript
> exit perform cycle
> end-perform
> perform timer-off
>
>
> move 'Dec to zero' to test-name
> perform timer-on
> perform repeat-factor times
> perform varying binary-number from 10 by -1 until binary-number
> = 0
> end-perform
> exit perform cycle
> end-perform
> perform timer-off
>
> move 'Inc to 10' to test-name
>
> perform timer-on
>
> perform repeat-factor times
>
> perform varying binary-number from 1 by 1 until binary-number >
> 10
> end-perform
>
> exit perform cycle
>
> end-perform
>
> perform timer-off
>
>
> goback
>
>
> . go-depending-test section.
> go to p1 p2 p3 depending on binary-number
> display 'error'
> . p1. display 'error'
> . p2. exit section
> . p3. display 'error'
> . evaluate-case-test section.
> evaluate binary-number
> when 1
> display 'error'
> when 2
> exit section
> when other
> display 'error'
> end-evaluate
>
> . end-of-previous section
> . timer-on.
> perform read-the-time
> move time-now to time-start
> . timer-off.
> perform read-the-time
> compute elapsed-time rounded = ((time-now - time-start)
> * 100000000 / repeat-factor)
> - timer-overhead
>
> if elapsed-time not greater than zero
> move 'error' to elapsed-time-display
> else
> compute elapsed-time-edited rounded = elapsed-time * 10
> end-if
> display test-name elapsed-time-display
> . read-the-time.
> accept time-now-hhmmsshh from time
> *> move function current-date to current-date-structure
> compute time-now =
> ((((hours * 60) +
> minutes) * 60) +
> seconds) +
> (hundredths / 100)
> .
Both the HP compiler on my Alpha and the Fujitsu compiler (COBOL97)
I have access to on a windows PC do not like this code. 8-) 8-)
I've not used the Micro Focus product so am unfamiliar with it.
Question:
What is the advantage of using EXIT PERFORM CYCLE over CONTINUE?
Jeff
----== Posted via Newsfeeds.Com - Unlimited-Unrestricted-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
| |
| Robert 2007-09-04, 3:55 am |
| On Mon, 03 Sep 2007 13:26:53 -0300, Clark F Morris <cfmpublic@ns.sympatico.ca> wrote:
>On Fri, 31 Aug 2007 21:22:35 -0500, Robert <no@e.mail> wrote:
>
>There may be other good reasons to go to display but if you are using
>a z series computer (latest evolution of the IBM 360), packed decimal
>is still faster than display.
I would expect binary to be the fastest on a z series, as it is on every other computer.
>Most results depend on the computer architecture.
The program shouldn't be tied to a specific machine, especially when the feature is out of
step with industry norms. Doing so locks users into one manufacturer, which is contrary to
the spirit of high level languages.
> I suspect in answer to your next question that
>alignment still matters on some currently sold computers with an
>architecture different from the ones tested on.
Yes, some CPUs don't require alignment e.g. IBM, Motorola, Intel 16/32 bit. Some throw an
exception and expect the operating system to handle it in software (expensively)
i.e.IA-64, most RISC machines including PA, PowerPC and Alpha. A few operating systems
abort the process when they get a misalignment fault e.g. old Apple.
Micro Focus' use of IBMCOMP for the compiler option that turns on memory boundary
awareness gives the impression that is (or was) a concern in the IBM mainframe world. Not
in my experience. I've never seen a mainframe, IBM or other, throw a fault for
misalignment. It doesn't seem appropriate for ANY machine with an L2 cache to do so. I
wrote the speed program primarily to test whether a modern PA processor (88xx/89xx, which
have L2) is slowed down by misalignment, secondarily to disprove (or not) Cobol myths
such as ODO being slow and demonstrate real inefficiencies such as packed decimal.
FWIW, IBM coined the word cache in the context of memory in 1967. The S/360 model 85 was
probably the first computer to use memory cache.
http://en.wikipedia.org/wiki/Memory_cache
| |
| Robert 2007-09-04, 3:55 am |
| On Mon, 03 Sep 2007 18:12:02 -0600, Jeff Campbell <n8wxs@arrl.net> wrote:
[color=darkred]
>Both the HP compiler on my Alpha and the Fujitsu compiler (COBOL97)
>I have access to on a windows PC do not like this code. 8-) 8-)
>
>I've not used the Micro Focus product so am unfamiliar with it.
>
>Question:
> What is the advantage of using EXIT PERFORM CYCLE over CONTINUE?
CONTINUE is the same as nothing, an empty loop. The null test was getting optimized out
when it was an empty loop. I added EXIT PERFORM CYCLE to stop that from happening, then
had to add it to all the others.
Removing the EXIT PERFORM CYCLEs, or replacing them with CONTINUE, would not seriously
affect the results.
If tests run too slowly, >10 seconds, lower repeat-factor to 10,000,000; If they run too
quckly, < 1 second, raise it to 1,000,000,000.
For a valid test of misaligned, remove SYNC from the 01 level. The compiler options are
Miicro Focus; you'll have to replace them with your compiler's. At minimum, you want to
turn off bounds checking. The HP Alpha compiler might be a rebranded Micro Focus.
| |
| William M. Klein 2007-09-04, 3:55 am |
| "Clark F Morris" <cfmpublic@ns.sympatico.ca> wrote in message
news:p7dod314t207iqabivuinitgdsj44pt2jg@
4ax.com...
> On Fri, 31 Aug 2007 21:22:35 -0500, Robert <no@e.mail> wrote:
<snip>[color=darkred]
> There may be other good reasons to go to display but if you are using
> a z series computer (latest evolution of the IBM 360), packed decimal
> is still faster than display. Most results depend on the computer
> architecture. I suspect in answer to your next question that
> alignment still matters on some currently sold computers with an
> architecture different from the ones tested on.
Clark,
Robert was clear (in his first note) that he was quoting efficiency
recommendations FROM a Micro Focus manual and that was the ONLY compiler that he
was talking about.
I wouldn't assume that either any recommendations OR test results would
necessarily be "portable" across compilers or operating systems.
Whether I think his test were ore were not "comprehensive" - I do think that he
was fair in applying rules from the documentation for a specific compiler and
O/S to that combination and then reporting the results he got.
I could ALMOST guarantee, that I could get different results (even with MF on
differen platforms - and with different directives) much less on zSeries.
--
Bill Klein
wmklein <at> ix.netcom.com
| |
| William M. Klein 2007-09-04, 3:55 am |
| Binary (when working with other Binary) may or may not be faster than PD for
some cases on zSeries. However, there are even MORE options that impact this
than just TRUNC (wich has 3 flavors on IBM zSeries). Furthermore, PD is usually
(not always) BEST when working with "combined" usages (such as input from a
"screen" in the same operation as something stored in a Database).
The following is the information on "comapring data types" for the Enterprise
COBOL Performance paper available at:
http://www-1.ibm.com/support/docvie...uid=swg27001475
(You might want to look at the entire paper to see what a COMPREHENVISE set of
performance test covers - in the way of "variations. Also it has some firm
statistics on indexes vs subscripts with this compiler.)
***
Comparing Data Types
When selecting your data types, it is important to understand the performance
characteristics of them before you use them. Shown below are some performance
considerations of doing several ADDs and SUBTRACTs on the various data types of
the specified precision.
Performance considerations for comparing data types (using ARITH(COMPAT)):
Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(STD)
using 1 to 9 digits: packed decimal is 30% to 60% slower than binary
using 10 to 17 digits: packed decimal is 55% to 65% faster than binary
using 18 digits: packed decimal is 74% faster than binary
Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(OPT)
using 1 to 8 digits: packed decimal is 160% to 200% slower than binary
using 9 digits: packed decimal is 60% slower than binary
using 10 to 17 digits: packed decimal is 150% to 180% slower than binary
using 18 digits: packed decimal is 74% faster than binary
Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(BIN) or
COMP-5
using 1 to 8 digits: packed decimal is 130% to 200% slower than binary
using 9 digits: packed decimal is 85% slower than binary
using 10 to 18 digits: packed decimal is 88% faster than binary
DISPLAY compared to packed decimal (COMP-3)
using 1 to 6 digits: DISPLAY is 100% slower than packed decimal
using 7 to 16 digits: DISPLAY is 40% to 70% slower than packed decimal
using 17 to 18 digits: DISPLAY is 150% to 200% slower than packed decimal
DISPLAY compared to binary (COMP or COMP-4) with TRUNC(STD)
using 1 to 8 digits: DISPLAY is 150% slower than binary
using 9 digits: DISPLAY is 125% slower than binary
using 10 to 16 digits: DISPLAY is 20% faster than binary
using 17 digits: DISPLAY is 8% slower than binary
using 18 digits: DISPLAY is 25% faster than binary
DISPLAY compared to binary (COMP or COMP-4) with TRUNC(OPT)
using 1 to 8 digits: DISPLAY is 350% slower than binary
using 9 digits: DISPLAY is 225% slower than binary
using 10 to 16 digits: DISPLAY is 380% slower than binary
using 17 digits: DISPLAY is 580% slower than binary
using 18 digits: DISPLAY is 35% faster than binary
DISPLAY compared to binary (COMP or COMP-4) with TRUNC(BIN) or COMP-5
using 1 to 4 digits: DISPLAY is 400% to 440% slower than binary
using 5 to 9 digits: DISPLAY is 240% to 280% slower than binary
using 10 to 18 digits: DISPLAY is 70% to 80% faster than binary
--
Bill Klein
wmklein <at> ix.netcom.com
"Robert" <no@e.mail> wrote in message
news:iu8pd39fh6huj8r55ksnpd6g2s6ildmdvd@
4ax.com...
> On Mon, 03 Sep 2007 13:26:53 -0300, Clark F Morris <cfmpublic@ns.sympatico.ca>
> wrote:
>
>
>
> I would expect binary to be the fastest on a z series, as it is on every other
> computer.
>
>
> The program shouldn't be tied to a specific machine, especially when the
> feature is out of
> step with industry norms. Doing so locks users into one manufacturer, which is
> contrary to
> the spirit of high level languages.
>
>
> Yes, some CPUs don't require alignment e.g. IBM, Motorola, Intel 16/32 bit.
> Some throw an
> exception and expect the operating system to handle it in software
> (expensively)
> i.e.IA-64, most RISC machines including PA, PowerPC and Alpha. A few
> operating systems
> abort the process when they get a misalignment fault e.g. old Apple.
>
> Micro Focus' use of IBMCOMP for the compiler option that turns on memory
> boundary
> awareness gives the impression that is (or was) a concern in the IBM mainframe
> world. Not
> in my experience. I've never seen a mainframe, IBM or other, throw a fault for
> misalignment. It doesn't seem appropriate for ANY machine with an L2 cache to
> do so. I
> wrote the speed program primarily to test whether a modern PA processor
> (88xx/89xx, which
> have L2) is slowed down by misalignment, secondarily to disprove (or not)
> Cobol myths
> such as ODO being slow and demonstrate real inefficiencies such as packed
> decimal.
>
> FWIW, IBM coined the word cache in the context of memory in 1967. The S/360
> model 85 was
> probably the first computer to use memory cache.
> http://en.wikipedia.org/wiki/Memory_cache
>
| |
| Howard Brazee 2007-09-04, 6:55 pm |
| On Fri, 31 Aug 2007 21:22:35 -0500, Robert <no@e.mail> wrote:
>Finding: CONFIRMED. Packed is almost as slow as display. It was fast on 1970-era
>mainframes. There is no longer any reason to use it. If you want to save space, look at
>space-filled strings and filler-padding.
Which illustrates that the thing that counts in this kind of test is
knowing that your tests were for specific a specific compiler and
hardware (and possibly compiler optimizing setting).
The machine I program with still has hardware support for Packed
decimal.
| |
| n8wxs@arrl.net 2007-09-04, 6:55 pm |
| On Sep 3, 10:46 pm, Robert <n...@e.mail> wrote:
> On Mon, 03 Sep 2007 18:12:02 -0600, Jeff Campbell <n8...@arrl.net> wrote:
>
>
>
> CONTINUE is the same as nothing, an empty loop. The null test was getting optimized out
> when it was an empty loop. I added EXIT PERFORM CYCLE to stop that from happening, then
> had to add it to all the others.
>
> Removing the EXIT PERFORM CYCLEs, or replacing them with CONTINUE, would not seriously
> affect the results.
>
> If tests run too slowly, >10 seconds, lower repeat-factor to 10,000,000; If they run too
> quckly, < 1 second, raise it to 1,000,000,000.
>
> For a valid test of misaligned, remove SYNC from the 01 level. The compiler options are
> Miicro Focus; you'll have to replace them with your compiler's. At minimum, you want to
> turn off bounds checking. The HP Alpha compiler might be a rebranded Micro Focus.
No it is not.
Here are the results I obtained. Machine is 600 MHz Alpha Personal
Workstation running
OpenVMS 7.3-1, COBOL compiler is version 2.8-1286.
$ cobol/nocheck/notruncate/alignment/noansi_format/optimize t.cob
$ link t.obj
$ run t.exe
Null test 0
Aligned 0
Unaligned 0
Misaligned (1) 1
Misaligned (2) 1
Binary 0
Linkage 0
Compute n=n+1 0
Rounded 0
size error 30
Display 0
Packed 0
Arithmetic 45
Compute 43
Index 2
Subscript 2
Depending 2
Evaluate true 5
Evaluate expression 5
Go to depending 23
Evalaute case 41
Initialize 0
Move zeros 0
Dec to zero 20
Inc to 10 20
Repeat count is 100,000,000.
Jeff
| |
| Clark F Morris 2007-09-04, 9:55 pm |
| On Mon, 03 Sep 2007 22:56:06 -0500, Robert <no@e.mail> wrote:
>On Mon, 03 Sep 2007 13:26:53 -0300, Clark F Morris <cfmpublic@ns.sympatico.ca> wrote:
>
>
>
>I would expect binary to be the fastest on a z series, as it is on every other computer.
>
>
>The program shouldn't be tied to a specific machine, especially when the feature is out of
>step with industry norms. Doing so locks users into one manufacturer, which is contrary to
>the spirit of high level languages.
>
>
>Yes, some CPUs don't require alignment e.g. IBM, Motorola, Intel 16/32 bit. Some throw an
>exception and expect the operating system to handle it in software (expensively)
>i.e.IA-64, most RISC machines including PA, PowerPC and Alpha. A few operating systems
>abort the process when they get a misalignment fault e.g. old Apple.
>
>Micro Focus' use of IBMCOMP for the compiler option that turns on memory boundary
>awareness gives the impression that is (or was) a concern in the IBM mainframe world. Not
>in my experience. I've never seen a mainframe, IBM or other, throw a fault for
>misalignment. It doesn't seem appropriate for ANY machine with an L2 cache to do so. I
>wrote the speed program primarily to test whether a modern PA processor (88xx/89xx, which
>have L2) is slowed down by misalignment, secondarily to disprove (or not) Cobol myths
>such as ODO being slow and demonstrate real inefficiencies such as packed decimal.
The IBM 360 required the binary data to be appropriately aligned,
half-word, word or double word. The 370 allowed misalignment but
extracted a performance penalty. I haven't kept up with later models.
In regard to packed decimal, if you are running business programs on a
360/370/390/z series machine, then packed decimal makes sense. It
avoids several problems. On other series of machines that don't have
full fixed decimal arithmetic, different rules apply. In regard to
ODO, the operative word is look at the generated code for the
operations that are actually affected by the ODO and then decide.
>
>FWIW, IBM coined the word cache in the context of memory in 1967. The S/360 model 85 was
>probably the first computer to use memory cache.
>http://en.wikipedia.org/wiki/Memory_cache
Clark Morris who started on an IBM 650, went to a Honeywell 800, an
IBM 1401, a RCA 301, and various models of IBM 360, 370, 4300, 390 and
z series.
| |
| Robert 2007-09-04, 9:55 pm |
| On Tue, 04 Sep 2007 05:10:29 GMT, "William M. Klein" <wmklein@nospam.netcom.com> wrote:
>Binary (when working with other Binary) may or may not be faster than PD for
>some cases on zSeries. However, there are even MORE options that impact this
>than just TRUNC (wich has 3 flavors on IBM zSeries). Furthermore, PD is usually
>(not always) BEST when working with "combined" usages (such as input from a
>"screen" in the same operation as something stored in a Database).
>
>The following is the information on "comapring data types" for the Enterprise
>COBOL Performance paper available at:
> http://www-1.ibm.com/support/docvie...uid=swg27001475
>
>(You might want to look at the entire paper to see what a COMPREHENVISE set of
>performance test covers - in the way of "variations. Also it has some firm
>statistics on indexes vs subscripts with this compiler.)
It says an index is 30% faster than a binary subscript. Subscripts require a
multiplication. Both require the addition of base + offset, which is usually 'free', i.e.
the referencing operand is base:index. So the difference is multiply versus load. Machines
can do either in one execution frame. The difference is pipelining -- the load might be
done before the instruction executes.
>Comparing Data Types
>
>When selecting your data types, it is important to understand the performance
>characteristics of them before you use them. Shown below are some performance
>considerations of doing several ADDs and SUBTRACTs on the various data types of
>the specified precision.
>
> Performance considerations for comparing data types (using ARITH(COMPAT)):
>
> Packed decimal (COMP-3) compared to binary (COMP or COMP-4) with TRUNC(STD)
> using 1 to 9 digits: packed decimal is 30% to 60% slower than binary
> using 10 to 17 digits: packed decimal is 55% to 65% faste | | |