Home > Archive > Fortran > September 2005 > Optimal programming advice
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Optimal programming advice
|
|
| Glyn Edwards 2005-09-07, 3:57 am |
| Hi,
Does anyone know of any good general guidelines for making code faster? In
fact is it worth bothering at all or are the compilers so good now that
they'll do all the optimization for you?
I ask because a while ago I tried to speed up a routine I was using by
reducing the number of calculations. If I compiled the original and the
reduced version without any -O options then the new code was faster by a
factor of 2. Once I turned the -O options on however the new code was
slower so I must have been hindering the optimizer some how.
Examples I'd be interested would be how fast is checking a number versus
addition and/or multiplication. For instance if I do a complicated
calculation and then multiply it by zero, would it be best to check if the
multiplier is zero first or not bother at all?
I haven't found much of use on the Internet so it may just be a black art
but I'd like to know if anyone had any advice.
Thanks
Glyn
| |
| Arjen Markus 2005-09-07, 7:57 am |
| In general: yes, optimizing actual code is a black art. There is a lot
of literature on creating optimal algorithms, but that is not the same
thing as optimizing the code.
The most common advice regarding optimization of source code is:
For average programmers: don't!
For expert programmers: don't do it now!
Your own experience indicates that this is still very useful advice ;)
Consider the following:
Modern computers try to optimize the execution of programs via
pipelining instructions, caching data and branch-prediction to drop a
few buzzwords.
If you try to be smart by developing code like:
if ( v .eq. 0.0 ) then
result = 0.0
else
result = v**2
endif
you will probably hinder a lot of optimizations regarding registers,
cache and what not.
It might be beneficial if you expect v to be zero most of the time, but
otherwise you better leave it to the compiler.
Having said that, I can think of one thing that would help the compiler
to optimize the execution: avoid jumping through arrays - that would
hinder the caching of data.
All other advice would just be of the same character.
Regards,
Arjen
| |
| David Flower 2005-09-07, 7:57 am |
|
Glyn Edwards wrote:
> Hi,
>
> Does anyone know of any good general guidelines for making code faster? In
> fact is it worth bothering at all or are the compilers so good now that
> they'll do all the optimization for you?
>
> I ask because a while ago I tried to speed up a routine I was using by
> reducing the number of calculations. If I compiled the original and the
> reduced version without any -O options then the new code was faster by a
> factor of 2. Once I turned the -O options on however the new code was
> slower so I must have been hindering the optimizer some how.
>
> Examples I'd be interested would be how fast is checking a number versus
> addition and/or multiplication. For instance if I do a complicated
> calculation and then multiply it by zero, would it be best to check if the
> multiplier is zero first or not bother at all?
>
> I haven't found much of use on the Internet so it may just be a black art
> but I'd like to know if anyone had any advice.
>
> Thanks
>
> Glyn
Bear in mind that there are three things you can optimise:
1) Execution time - this is what you are discussing, and only really
applies to code heavily used
2) Memory - not such a problem as in my youth!
3) Clarity of code - this is frequently forgotton, but is actually
quite important, as opaque code can easily conceal a bug
Dave Flower
| |
| Glyn Edwards 2005-09-07, 7:57 am |
|
> Bear in mind that there are three things you can optimise:
>
> 1) Execution time - this is what you are discussing, and only really
> applies to code heavily used
Good point, I'm interested in speed in this case, memory is not a problem
>
> 2) Memory - not such a problem as in my youth!
>
> 3) Clarity of code - this is frequently forgotton, but is actually
> quite important, as opaque code can easily conceal a bug
I agree, I've written nice(ish) code and now it has been debugged I'd
like to speed up certain bits of it. I ran the code with a profiler to see
where it spent the most time but I've been unsuccessful at speeding those
routines up.
Glyn
| |
| Glyn Edwards 2005-09-07, 7:57 am |
| On Wed, 07 Sep 2005 02:38:36 -0700, Arjen Markus wrote:
> If you try to be smart by developing code like:
>
> if ( v .eq. 0.0 ) then
> result = 0.0
> else
> result = v**2
> endif
>
> you will probably hinder a lot of optimizations regarding registers,
> cache and what not.
I think this is what I've come up against
>
> It might be beneficial if you expect v to be zero most of the time, but
> otherwise you better leave it to the compiler.
>
> Having said that, I can think of one thing that would help the compiler
> to optimize the execution: avoid jumping through arrays - that would
> hinder the caching of data.
Can you elaborate at all? I may well be doing a lot of this
Thanks
Glyn
| |
| Arjen Markus 2005-09-07, 7:57 am |
| For instance:
real, dimension(1:1000,1:1000) :: array
do i = 1,1000
do j = 1,1000
array(i,j) = 1.0
enddo
enddo
will jump through the array - because the first index (i) is updated
later than the second one (j).
So, if, as is most often the case, the array occupies a contiguous
block of memory, you would
first access element 1, then element 1001, then element 2001 etc. After
1000 elements,
you would then access element 2, 2002, 3002, etc.
Reverse the ordering:
do j = 1,1000
do i = 1,1000
array(i,j) = 1.0
enddo
enddo
the elements are now accessed in order: 1, 2, 3, ...
I am not sure, but array operations are supposed to help the compiler
figure out the best
way to achieve the operation (explicit loops as the above are more
difficult to analyse).
So, a loop like the above could be replaced by:
array = 1.0
This is just an example, but you should get the drift.
Regards,
Arjen
| |
| Glyn Edwards 2005-09-07, 7:57 am |
|
> So, a loop like the above could be replaced by:
>
> array = 1.0
>
> This is just an example, but you should get the drift.
Ok, thanks for the help. I'll have a look through my code for examples
like this
Glyn
| |
| Michel OLAGNON 2005-09-07, 7:57 am |
|
Glyn Edwards wrote:
>
> Good point, I'm interested in speed in this case, memory is not a problem
>
Memory access is just time, and may become a problem especially when one
uses memory as if it were not a problem... :-(
| |
| Pierre Asselin 2005-09-07, 7:57 am |
| Glyn Edwards <glynedwards@fastmail.fm> wrote:
> Does anyone know of any good general guidelines for making code faster? In
> fact is it worth bothering at all or are the compilers so good now that
> they'll do all the optimization for you?
Spend a w reading manuals. Find out about the profiling tools
available on your platform so you know what parts of the code
you should optimize. Read your compiler manuals, they may have
a chapter on optimization and profiling. Do *not* attempt any
optimizations before you have positively identified the bottlenecks.
--
pa at panix dot com
| |
| Glyn Edwards 2005-09-07, 6:59 pm |
| On Wed, 07 Sep 2005 12:52:26 +0000, Pierre Asselin wrote:
> Spend a w reading manuals. Find out about the profiling tools
> available on your platform so you know what parts of the code
> you should optimize. Read your compiler manuals, they may have
> a chapter on optimization and profiling. Do *not* attempt any
> optimizations before you have positively identified the bottlenecks.
Hence the second part of the post, after having read the manuals etc... I
only seemed to be able to impede the optimizer.
Glyn
| |
| Richard E Maine 2005-09-07, 6:59 pm |
| In article <1126085916.831135.220030@g44g2000cwa.googlegroups.com>,
"Arjen Markus" <arjen.markus@wldelft.nl> wrote:
> The most common advice regarding optimization of source code is:
> For average programmers: don't!
> For expert programmers: don't do it now!
Along the same vein... and this stuff is so often cited as to be
cliches, but it is still good advice.
1. First make the code work correctly before worrying about
optimization. That's related to the statements that Markus made above.
There isn't much point in optimizing wrong answers. As obvious as this
might seem, it clearly isn't obvious enough, because it happens a *LOT*
that people spend much effort in optimizing wrong answers. I cannot
overemphasize this.
2. Others have commented on the importance of code clarity, which I also
agree on. Unclear code is more likely to be incorrect code, or perhaps
become incorrect code during maintenance. See point 1. Also unclear code
is likely to be harder to optimize, both for the human, and on occasion
for the compiler's optimizer.
3. When you do get to working on optimization, the first thing to
optimize is the algorithm instead of the code. Good choice of algorithm
can and sometimes does make a difference of many orders of magnitude.
Yes, I meant both the "many" and the "orders of magnitude" literally.
This is also related to the statements Markus made above in that it is
common for people to jump into optimizing the code for a poor algorithm.
4. When and if you finally get to optimizing code, make actual
measurements, starting with actual measurements to tell you what parts
of the code are the real bottlenecks. Sounds like you have been doing
some of this, which is good, but it is so important that it is worth the
reminder anyway. After all, someone else might also read this. There is
a long history of people making changes that they were absolutely sure
would improve speed, but in fact had the opposite effect. I've done it.
5. Be sure to consider the cost effectiveness of effort put into
optimization. There do exist cases where a 5% improvement in speed can
be worth large amounts of effort. (For example, you have a real-time
simulator and the 5% is the difference between the simulator working or
not, as can happen. It isn't just a matter of getting results 5% faster.
And you can't replace the hardware with the faster stuff that is
probably already on the market by the time you are doing this.) But
those cases are rare.
6. Remember that low-level code optimization issues change over time as
computer architectures change. Optimizations that helped 10 years ago
can hurt today. Yes, it has happened; I could cite specific cases. And
it will probably happen again.
7. When you finally pass all the above steps and get serious about code
optimization... well... as others have mentioned, it is very much an
art, which takes lots of study, experience, and a certain knack to get
really good at. You can't learn it completely from a few short pointers.
On the other hand, there are one or two pointers of things that can
sometimes have large return for small effort. The biggest of these is
memory access patterns. In todays architectures, memory access is often
a dominating factor in speed, often outweighing the actual
"computations". This is one of those things that have changed over time,
so you'll often get counterproductive advice if you look at things from
20 years ago. Others in the thread have given a few specific pointers on
the memory access issue.
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| glen herrmannsfeldt 2005-09-07, 6:59 pm |
| Glyn Edwards wrote:
> Does anyone know of any good general guidelines for making code faster? In
> fact is it worth bothering at all or are the compilers so good now that
> they'll do all the optimization for you?
In general for most modern machines, it is often hard to know which
code will be optimal. Compilers are not good enough to do all
optimizations, but they do many of them. Also, some that in the past
were helpful can make it harder for the optimizing compiler.
I suppose the most general guideline I can think of is to make sure
that the compiler knows what you are trying to do.
> I ask because a while ago I tried to speed up a routine I was using by
> reducing the number of calculations. If I compiled the original and the
> reduced version without any -O options then the new code was faster by a
> factor of 2. Once I turned the -O options on however the new code was
> slower so I must have been hindering the optimizer some how.
To do source optimizing you really need to know what optimizing the
compiler is doing. Common subexpression elimination has been done
by optimizing compilers for a long time, usually better than hand
assigning an expression to a temporary variable. You can't be
sure that the compiler will find them, though, and it is often more
readable and easier to debug having a complicated expression
in only one place.
> Examples I'd be interested would be how fast is checking a number versus
> addition and/or multiplication. For instance if I do a complicated
> calculation and then multiply it by zero, would it be best to check if the
> multiplier is zero first or not bother at all?
One of the biggest problems in modern machines is keeping the pipelines
full. Conditional branches are one of the main limits on speed, and
so it is rarely helpful to do such checks. If, as you ask, addition or
multiplication could be speeded up with a check the processor could do
that internally, and it might be that many do. For more complicated
expressions it is harder to say, though. Branch prediction logic is
one of the most important features of newer processors, but it isn't
perfect, and never will be.
> I haven't found much of use on the Internet so it may just be a black art
> but I'd like to know if anyone had any advice.
In the past, one tool was to look at the generated assembly
code. With newer machines, especially RISC machines but others
also, it is not easy to look at a sequence of instructions and
predict how fast it will run.
The most important, as others have said, is to be sure to access
arrays in the order they are stored in memory. For the simple
cases array expressions are probably optimal, for more complicated
cases they can require temporary arrays that otherwise wouldn't be
needed. Array expressions are not the general answer to fast code,
but sometimes will be faster.
-- glen
| |
| Kevin G. Rhoads 2005-09-07, 6:59 pm |
| >Does anyone know of any good general guidelines for making code faster?
1) Profiling and measurement. Theories are nice, data is better.
2) Time/Space trade-offs -- precompute tables and look-up, but don't assume that is faster, measure it.
3) Trading clarity for speed is ALWAYS a mistake, except in throw-away code. (No code is *really* throw-away.)
4) Optimize from high level to lower levels (e.g., don't spend lots of time trying to hand optimize some sub in
assembly until the last (probably never) step). Major code organization issues first! Loop orders on multi-
dimensional arrays &c.
5) Will a 10% speed improvement really mean anything to anyone? Or do you need 10x for it to be worthwhile?
| |
| Kevin G. Rhoads 2005-09-07, 6:59 pm |
| >> This is just an example, but you should get the drift.
>
>Ok, thanks for the help. I'll have a look through my code for examples
BE WARNED. Index order differs for differing languages. If you stick to Fortran, no
problem, but C is the other way around. So mixed language stuff is a "gotcha" waiting
to happen.
| |
| David Jones 2005-09-07, 6:59 pm |
| glen herrmannsfeldt wrote:
>
> To do source optimizing you really need to know what optimizing the
> compiler is doing. Common subexpression elimination has been done
> by optimizing compilers for a long time, usually better than hand
> assigning an expression to a temporary variable. You can't be
> sure that the compiler will find them, though, and it is often more
> readable and easier to debug having a complicated expression
> in only one place.
>
An important thing, similar but different to the above, is to try to
avoid repeatedly recalculating the same quantities. For example, you
may have a deep set of loops where some of the intermediate quantities
being used depend on only a few of the index variables: then you can
try calculating these and storing them in an array within a smaller
set of loops before starting the main loops. You may have several sets
of loops where the same intermediate quantities are calculated and the
same improvement can be made.
| |
| Gordon Sande 2005-09-07, 6:59 pm |
| On 2005-09-07 12:40:30 -0300, Richard E Maine <nospam@see.signature> said:
> In article <1126085916.831135.220030@g44g2000cwa.googlegroups.com>,
> "Arjen Markus" <arjen.markus@wldelft.nl> wrote:
>
>
> Along the same vein... and this stuff is so often cited as to be
> cliches, but it is still good advice.
>
> 1. First make the code work correctly before worrying about
> optimization. That's related to the statements that Markus made above.
> There isn't much point in optimizing wrong answers. As obvious as this
> might seem, it clearly isn't obvious enough, because it happens a *LOT*
> that people spend much effort in optimizing wrong answers. I cannot
> overemphasize this.
>
> 2. Others have commented on the importance of code clarity, which I
> also agree on. Unclear code is more likely to be incorrect code, or
> perhaps become incorrect code during maintenance. See point 1. Also
> unclear code is likely to be harder to optimize, both for the human,
> and on occasion for the compiler's optimizer.
>
> 3. When you do get to working on optimization, the first thing to
> optimize is the algorithm instead of the code. Good choice of algorithm
> can and sometimes does make a difference of many orders of magnitude.
> Yes, I meant both the "many" and the "orders of magnitude" literally.
> This is also related to the statements Markus made above in that it is
> common for people to jump into optimizing the code for a poor algorithm.
To reinforce the point - if you try to optimize early you will find that
it is much harder to recognize where you can improve the algorithm.
Knuth is often credited with the quote "Premature optimization is the root
of all evil".
> 4. When and if you finally get to optimizing code, make actual
> measurements, starting with actual measurements to tell you what parts
> of the code are the real bottlenecks. Sounds like you have been doing
> some of this, which is good, but it is so important that it is worth
> the reminder anyway. After all, someone else might also read this.
> There is a long history of people making changes that they were
> absolutely sure would improve speed, but in fact had the opposite
> effect. I've done it.
Be aware that some things can not be improved, like input/output, so
attempts at
optimization will be on no use. There are many stories of folks who
polish their
code including adding some extra output for clarity and the whole thing
runs slower.
> 5. Be sure to consider the cost effectiveness of effort put into
> optimization. There do exist cases where a 5% improvement in speed can
> be worth large amounts of effort. (For example, you have a real-time
> simulator and the 5% is the difference between the simulator working or
> not, as can happen. It isn't just a matter of getting results 5%
> faster. And you can't replace the hardware with the faster stuff that
> is probably already on the market by the time you are doing this.) But
> those cases are rare.
To make the point remember that much of the hype about optimization
dates back to
when mainframes were 1/2 MHz (that is 2 microsecond cycle time but with more
work per cycle and 32k words of memory was a fair bit) and too much of the
audience is the equivalent of the other guys in the bar. (How else do you get
the n'th paper on some already overstudied topic published to make a longer
resume for the granting committees?)
> 6. Remember that low-level code optimization issues change over time as
> computer architectures change. Optimizations that helped 10 years ago
> can hurt today. Yes, it has happened; I could cite specific cases. And
> it will probably happen again.
>
> 7. When you finally pass all the above steps and get serious about code
> optimization... well... as others have mentioned, it is very much an
> art, which takes lots of study, experience, and a certain knack to get
> really good at. You can't learn it completely from a few short
> pointers. On the other hand, there are one or two pointers of things
> that can sometimes have large return for small effort. The biggest of
> these is memory access patterns. In todays architectures, memory access
> is often a dominating factor in speed, often outweighing the actual
> "computations". This is one of those things that have changed over
> time, so you'll often get counterproductive advice if you look at
> things from 20 years ago. Others in the thread have given a few
> specific pointers on the memory access issue.
It is hard to improve on Richard's nice summary of the standard issues
other than
to recast some parts in a form that might make them stick a bit better.
| |
| glen herrmannsfeldt 2005-09-07, 6:59 pm |
| Kevin G. Rhoads wrote:
[color=darkred]
> 1) Profiling and measurement. Theories are nice, data is better.
Especially find the one routine using most of the time and speed
that up. More or less, Amdahl's law.
> 2) Time/Space trade-offs -- precompute tables and look-up,
> but don't assume that is faster, measure it
Table lookup algorithms are often better than others.
> 3) Trading clarity for speed is ALWAYS a mistake, except in throw-away code.
> (No code is *really* throw-away.)
This one I don't agree with. I was thinking about how to say it
and then read Richard Maine's post. Consider sorting algorithms.
The O(n**2) sorts are likely easier to read and understand, but can
be a lot slower than the O(n log n) sorts. For one statement it might
be true, but at the algorithm level I don't believe it is. As a
reminder, Richard suggests:
> 3. When you do get to working on optimization, the first thing to
> optimize is the algorithm instead of the code. Good choice of
> algorithm can and sometimes does make a difference of many orders
> Yes, I meant both the "many" and the "orders of magnitude" literally.
> This is also related to the statements Markus made above in that
> it is common for people to jump into optimizing the code for a poor
> algorithm.
> 4) Optimize from high level to lower levels (e.g., don't spend lots
> of time trying to hand optimize some sub in assembly until the
> last (probably never) step).
> Major code organization issues first! Loop orders on
> multi-dimensional arrays &c.
I agree.
> 5) Will a 10% speed improvement really mean anything to anyone?
> Or do you need 10x for it to be worthwhile?
It still surprises me how that people bother to make processors
10% faster and then sell them for much higher prices for about
the same reason.
-- glen
| |
| James Van Buskirk 2005-09-07, 6:59 pm |
| "Glyn Edwards" <glynedwards@fastmail.fm> wrote in message
news:pan.2005.09.07.08.36.33.817828@fastmail.fm...
> Does anyone know of any good general guidelines for making code faster? In
> fact is it worth bothering at all or are the compilers so good now that
> they'll do all the optimization for you?
Compilers will most definitely not do all the optimizing for you.
I find compiled code to be perhaps 20-30% effective compared to
ideal code. What I do is to determine which parts of the code
are taking the most time; Fortran doesn't provide a timer accurate
enough to separate out one line of code from the next, and separating
out a given line and testing it in isolation can greatly distort
the performance consequences of that line. Accordingly, I use an
assembly language function to time short stretches of code; examples
for some x86 compilers can be found at
http://home.comcast.net/~kmbtib/real_ft.ZIP .
Having done this and determined where most of the time is actually
being spent, I try to come up with a model for ideal code and how
much time it should actually take. Then I try to figure out why
it's actually taking longer than this. Looking at disassemblies
can be very informative in this regard. You need to know quite a
bit about the processor in question to do this intelligently. Go
to the website of your processor manufacturer to find documentation;
this should be easy to find if the manufacturer is not Intel. Run
some experiments so that you can see the effects of cache thrashing,
TLB thrashing, branch misprediction, and load-store overload on
your processor. Just as programming correct code can't be learned
by reading documentation alone but is a skill that must be developed
through use, writing fast code isn't a purely theoretical pursuit.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
| |
| E. Robert Tisdale 2005-09-07, 6:59 pm |
| Kevin G. Rhoads wrote:
>
> BE WARNED. Index order differs for differing languages.
> If you stick to Fortran, no problem, but C is the other way around.
> So mixed language stuff is a "gotcha" waiting to happen.
No.
The only difference between Fortran and C arrays is that
Fortran array subscripts appear in order from left to right and
C array subscripts appear in *reverse* order from right to left.
The elements of array(m, n) and array[n][m]
are stored *exactly* the same way in memory!
To avoid *striding*, you might write
integer i, j
do j = 1, n
do i = 1, m
array(i, j) = 1.0
end do
end do
in Fortran or
for (int j = 0; j < n; ++j)
for (int i = 0; i < m; ++i)
array[j][i] = 1.0;
in C.
Fortran and C arrays appear to be transposes of each other
simply because of the order in which the subscripts appear.
| |
| E. Robert Tisdale 2005-09-07, 6:59 pm |
| Glyn Edwards wrote:
> Does anyone know of any good general guidelines for making code faster?
> In fact is it worth bothering at all or are the compilers so good now
> that they'll do all the optimization for you?
Compilers won't implement block algorithms for you yet but
there is an implementation of the Basic Linear Algebra Subprogram (BLAS)
library that automatically "tunes" block algorithms for your platform:
Automatically Tuned Linear Algebra Software (ATLAS)
http://math-atlas.sourceforge.net/
| |
| glen herrmannsfeldt 2005-09-07, 6:59 pm |
| E. Robert Tisdale wrote:
> Kevin G. Rhoads wrote:
(snip)
[color=darkred]
> No.
> The only difference between Fortran and C arrays is that
> Fortran array subscripts appear in order from left to right and
> C array subscripts appear in *reverse* order from right to left.
> The elements of array(m, n) and array[n][m]
> are stored *exactly* the same way in memory!
That is the real question.
When doing matrix arithmetic in Fortran one must choose
which subscript to call "row" and which to call "column"
in the mathematical sense. If, for example, one considers
a matrix on a printed page, or in a text file on disk,
one tends to call a row one line on the page or in the
file. In that case, the natural way to read them in
puts the row number as the second subscript and the
column number as the first.
Mathematical notation normally puts the row number first,
and the column number second. One must choose, then,
and as far as I know matrix software has been done
both ways.
Now, note what the previous post said. Index order, that is,
the order of the subscripts, is different. For calls between
Fortran and other languages that is what you need to know.
-- glen
| |
| Greg Lindahl 2005-09-07, 6:59 pm |
| In article <1126092906.072077.218720@z14g2000cwz.googlegroups.com>,
Arjen Markus <arjen.markus@wldelft.nl> wrote:
>For instance:
>
>real, dimension(1:1000,1:1000) :: array
>
>do i = 1,1000
> do j = 1,1000
> array(i,j) = 1.0
> enddo
>enddo
>
>will jump through the array -
No, it won't. A good compiler will swap the loops and do it in the
right order.
Don't worry, be happy.
>I am not sure, but array operations are supposed to help the compiler
>figure out the best
>way to achieve the operation (explicit loops as the above are more
>difficult to analyse).
This is not true for many compilers. For example, PathScale's compiler
converts all array syntax into explicit loops, and then optimizes the
explicit loops. It's the only way to be sure that you optimize both
well.
-- greg
| |
| Greg Lindahl 2005-09-07, 6:59 pm |
| In article <431f1452$1@news.nwl.ac.uk>, David Jones <dajxxx@ceh.ac.uk> wrote:
> An important thing, similar but different to the above, is to try to
>avoid repeatedly recalculating the same quantities. For example, you
>may have a deep set of loops where some of the intermediate quantities
>being used depend on only a few of the index variables: then you can
>try calculating these and storing them in an array within a smaller
>set of loops before starting the main loops.
.... or you can use a good compiler, which will insert an array
temporary in this case. Yeah, this is a hard optimization for a
compiler to do, but some still manage to find it.
-- greg
| |
| glen herrmannsfeldt 2005-09-07, 9:56 pm |
| Greg Lindahl wrote:
> In article <1126092906.072077.218720@z14g2000cwz.googlegroups.com>,
> Arjen Markus <arjen.markus@wldelft.nl> wrote:
[color=darkred]
[color=darkred]
> No, it won't. A good compiler will swap the loops
> and do it in the right order.
> Don't worry, be happy.
And what about not so good compilers?
Also, as the expression gets more complicated it will be
harder for the compiler to know. If you call non-pure
(is that the term?) functions the compiler can't change
the order.
[color=darkred]
> This is not true for many compilers. For example, PathScale's compiler
> converts all array syntax into explicit loops, and then optimizes the
> explicit loops. It's the only way to be sure that you optimize both
> well.
That does seem to be the best way to me, too.
-- glen
| |
| Mr Hrundi V Bakshi 2005-09-07, 9:56 pm |
|
"Greg Lindahl" <lindahl@pbm.com> wrote in message
news:431f77c2$1@news.meer.net...
> In article <1126092906.072077.218720@z14g2000cwz.googlegroups.com>,
> Arjen Markus <arjen.markus@wldelft.nl> wrote:
>
> No, it won't. A good compiler will swap the loops and do it in the
> right order.
>
Baloney, stop spamming on behalf of your personal rice bowl, you present as
a phoney in the style of wa wa Lionel, the Zeilig of Fortran: you on ganga
or perhaps your ponytail is in a knot or knit?, a good compiler will at
compile time do:
array = 1.0
as the Dutchman remarked
> Don't worry, be happy.
OK. However,
>
>...PathScale's compiler converts all array syntax into explicit loops, and
then optimizes the
> explicit loops. It's the only way to be sure that you optimize both well.
Here we go again, PathofthePaddle!
| |
| Jan Vorbrüggen 2005-09-08, 7:58 am |
| > This is not true for many compilers. For example, PathScale's compiler
> converts all array syntax into explicit loops, and then optimizes the
> explicit loops. It's the only way to be sure that you optimize both
> well.
That is of course possible for that direction. However, a sequence of array
expressions can sometimes be packed into one loop that does less work, e.g.,
because temporaries can be re-used or not generated at all. Such an optimi-
zation might, of course, require information that programmer knows but that
the compiler cannot or cannot readily infer. Thus, there still can be cases
were the loop (nest) is faster or easier to optimize compared to the other-
wise equivalent set of array expressions.
Jan
| |
| Greg Lindahl 2005-09-08, 7:58 am |
| In article <3oammsF4pqoqU1@individual.net>,
Jan Vorbrüggen <jvorbrueggen-not@mediasec.de> wrote:
>That is of course possible for that direction. However, a sequence of array
>expressions can sometimes be packed into one loop that does less work, e.g.,
This is called loop fusion, and we do it on explicit loops. We can
create or eliminate temporaries, whether scalars or arrays. Array
notation doesn't make this easier.
These are fairly standard optimizations, and many compilers other than
ours can do this optimization with explicit loops.
-- greg
(working for, not speaking for, PathScale.)
| |
| Jan Vorbrüggen 2005-09-08, 6:59 pm |
| > This is called loop fusion, and we do it on explicit loops. We can
> create or eliminate temporaries, whether scalars or arrays. Array
> notation doesn't make this easier.
......
Did you mean to say "harder"? That I would understand.
Yes, I know loop fusion. Is the state of (PathScale's) art in loop fusion
such that I, as a programmer, shouldn't bother? Or are there still (fairly)
common cases a human being can see how to fuse them where the/your compiler
cannot? Just asking...I have no good idea of how good compilers have become
at this.
Jan
| |
| Glyn Edwards 2005-09-08, 6:59 pm |
| On Wed, 07 Sep 2005 16:37:29 -0700, Greg Lindahl wrote:
> In article <431f1452$1@news.nwl.ac.uk>, David Jones <dajxxx@ceh.ac.uk> wrote:
>
>
> ... or you can use a good compiler, which will insert an array
> temporary in this case. Yeah, this is a hard optimization for a
> compiler to do, but some still manage to find it.
>
> -- greg
In fact this is one of the approaches I tried, after profiling I
rearranged the loops to prevent such repeated calculations. Unfortunately
this resulted in slower code after the compiler had optimized it hence the
post.
Thanks for the advice though, this thread is proving very useful
Glyn
| |
| Brooks Moses 2005-09-09, 3:57 am |
| Glyn Edwards wrote:
> Does anyone know of any good general guidelines for making code faster? In
> fact is it worth bothering at all or are the compilers so good now that
> they'll do all the optimization for you?
One general guideline: Profile the code. Only optimize the stuff that's
actually taking up time.
Another guideline: Optimization is an experimental process, not a
theoretical one. This means you need to do experiments.
> I ask because a while ago I tried to speed up a routine I was using by
> reducing the number of calculations. If I compiled the original and the
> reduced version without any -O options then the new code was faster by a
> factor of 2. Once I turned the -O options on however the new code was
> slower so I must have been hindering the optimizer some how.
>
> Examples I'd be interested would be how fast is checking a number versus
> addition and/or multiplication. For instance if I do a complicated
> calculation and then multiply it by zero, would it be best to check if the
> multiplier is zero first or not bother at all?
Another bit of advice: this stuff is often rather sensitive to the
details of the particular case, to the system, to the processor
involved, and so forth. It's relatively simple to write up some tests
to do this sort of thing -- though you do want to be careful about it.
Run the tests with no optimization to make sure that the code is
actually testing exactly what you tell it to, and see some of James Van
Buskirk's comments in the recent "[CHALLENGE] finding rightmost zero
bit" thread about using realistic sample data.
In any case, though, a reasonable test is likely to provide you with at
least as accurate an answer to this sort of question as you'd get from
looking for someone else's answer.
Actually, while you probably want to do some simple tests with the
optimizer off, you should also do tests with the optimizer on and with
most of your full code in there. These changes are fairly simple to
code up -- write a version of your code that uses one, and a version
that uses the other, and benchmark them against each other.
In particular, in the case that you ask about, this is going to depend
quite heavily on a lot of the small details -- if the cases where it's
zero are quite predictable and you're only doing one number at a time,
the check may take effectively zero time. If they're not predictable
and the arithmetic is in a vectorized loop, the check is liable to break
the vectorization and slow things down a lot. The only real way to know
is to try them both out on realistic data in a realistic subroutine.
- Brooks
--
The "bmoses-nospam" address is valid; no unmunging needed.
| |
| Brooks Moses 2005-09-09, 3:57 am |
| Richard E Maine wrote:
> 7. When you finally pass all the above steps and get serious about code
> optimization... well... as others have mentioned, it is very much an
> art, which takes lots of study, experience, and a certain knack to get
> really good at. You can't learn it completely from a few short pointers.
> On the other hand, there are one or two pointers of things that can
> sometimes have large return for small effort. The biggest of these is
> memory access patterns. In todays architectures, memory access is often
> a dominating factor in speed, often outweighing the actual
> "computations". This is one of those things that have changed over time,
> so you'll often get counterproductive advice if you look at things from
> 20 years ago. Others in the thread have given a few specific pointers on
> the memory access issue.
One thing along those lines -- and I _know_ this is somewhat completely
ignoring the whole "do the experiments yourself, because things differ"
stuff that I (and everyone else) just posted, but....
On a 20-year-old code that I was working with not too long ago, which
did 2-D hydrodynamic calculations with 5-variable equations, the author
put all the variables in a single three-dimensional array such that all
of the different variables for a single physical location were stored in
adjacent memory.
That is, instead of Ux[i,j], Uy[i,j], P[i,j], and so forth, he had
W[i,j,n], where W[i,j,1] was the x-velocity, W[i,j,2] was the
y-velocity, W[i,j,3] was the pressure, and so forth.
My guess is that this was done to speed up memory accesses, since those
variables were often accessed at the same time.
Was this likely to have actually done anything useful, speedwise, 20
years ago? Is it likely to do anything useful now?
(And: Oops. I just realized: W[i,j,n] in the above example is the wrong
order in Fortran to actually do what I was thinking this did, isn't it?
It should be W[n,i,j]. Which means it's completely useless for
that, and I have no idea why it was done this way.... Still, the
question is one I'm curious about, anyhow.)
- Brooks
--
The "bmoses-nospam" address is valid; no unmunging needed.
| |
| Brooks Moses 2005-09-09, 3:57 am |
| glen herrmannsfeldt wrote:
> Kevin G. Rhoads wrote:
>
> This one I don't agree with. I was thinking about how to say it
> and then read Richard Maine's post. Consider sorting algorithms.
> The O(n**2) sorts are likely easier to read and understand, but can
> be a lot slower than the O(n log n) sorts. For one statement it might
> be true, but at the algorithm level I don't believe it is.
That's also very true in computational physics codes, too -- the orders,
oddly enough, aren't much different, but the values of n in the O(n**2)
and O(n log n) can be tens of thousands or more, so it's a tremendous
difference.
In my experience with CFD, trading clarity for algorithm speed has
_only_ been worth it on the throw-away code, which is exactly the
opposite of how Kevin has it!
(On the other hand, writing the more complicated algorithms as clearly
as possible, including commenting the tricky bits, has definitely been
worth the time pay-off.)
- Brooks
--
The "bmoses-nospam" address is valid; no unmunging needed.
| |
| Jan Vorbrüggen 2005-09-09, 3:57 am |
| > That is, instead of Ux[i,j], Uy[i,j], P[i,j], and so forth, he had
> W[i,j,n], where W[i,j,1] was the x-velocity, W[i,j,2] was the
> y-velocity, W[i,j,3] was the pressure, and so forth.
>
> My guess is that this was done to speed up memory accesses, since those
> variables were often accessed at the same time.
>
> Was this likely to have actually done anything useful, speedwise, 20
> years ago? Is it likely to do anything useful now?
>
> (And: Oops. I just realized: W[i,j,n] in the above example is the wrong
> order in Fortran to actually do what I was thinking this did, isn't it?
> It should be W[n,i,j]. Which means it's completely useless for that,
> and I have no idea why it was done this way.... Still, the question is
> one I'm curious about, anyhow.)
As you note, the actual example you have is not different from the "naive"
way of using different variable names.
The tradeoff you are asking is comparing arrays of structures to structures
of arrays. As with any consideration of caches in a system, you need to ask
what the spatio-temporal locality in your algorithm is. Thus, if you are
always accessing all variables at one "point" (for lack of a better general
term) at the same time, go for the array of structure layout. If you are
primarily using just one - or perhaps a subset - of the data at a point
all of the time, and the rest only sometimes or rarely, use the structure of
arrays or even a mixed approach, i.e., break up the variable set into the
pieces used often and those used rarely. And then there are the alignment
considerations...
But with the regard to your more general question: the answers are yes
(unless running on a Cray), and yes.
Jan
| |
| David Flower 2005-09-09, 3:57 am |
|
Kevin G. Rhoads wrote:
>
> . . . Theories are nice, data is better.
I'm in pedant mode:
'data are better' please
Dave Flower
|
|
|
|
|