Home > Archive > Compilers > February 2008 > Profile-driven optimization
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Profile-driven optimization
|
|
| Tim Frink 2008-01-30, 4:58 am |
| Hi,
Recent versions of compilers perform profile-driven optimizations,
i.e. they first profile the program by running it and collect
information about the frequency of taken branches and how often basic
blocks have been executed. Based on that, they get one path through
the program that is executed most frequently.
I'm now wondering what sort of optimizations can be performed to that
path. The only optimization I know is block re-ordering by moving
blocks such that unconditional branches are eliminated on the typical
path leading to e.g. an improved pipeline behavior due to missing
conditional hazards.
Most of the optimizations could be applied to the entire code since
they can improve code quality globally so it does not make sense to
perform for example constant propagation just to this one path and
leave the remaining code unchanged. This is the same for other
compiler optimizations that come into my mind.
Do you know any other optimizations that could be applied in
particular to this one path? This must be some sort of optimizations
that, due to any restrictions, cannot be applied to the entire code
but just to some few code fragments (like the profiling path).
Any papers/links/books are appreciated.
Best regards,
Tim
[I wouldn't say it's recent. Multiflow's compiler was using profile
feedback at least 20 years ago. There must be lots of papers since
then. -John]
| |
|
| On Jan 29, 5:29 am, Tim Frink <plfr...@yahoo.de> wrote:
> Recent versions of compilers perform profile-driven optimizations,
> i.e. they first profile the program by running it and collect
> information about the frequency of taken branches and how often basic
> blocks have been executed. Based on that, they get one path through
> the program that is executed most frequently.
>
> I'm now wondering what sort of optimizations can be performed to that
> path. ...
> [I wouldn't say it's recent. Multiflow's compiler was using profile
> feedback at least 20 years ago. There must be lots of papers since
> then. -John]
Straight-line code has no cache flushes, so instruction scheduling can
be more effective. You don't need a special algorithm, though. It
just happens.
You might also repeat blocks to make more than one path straight-line.
If you have A(B|C)D and both paths profile as critical, then you might
compile A(BD|CD) to eliminate a jump at the end of B.
You might also tune the register allocator to specially favor
variables of the critical path for global register allocation with
minimal swaps and spills.
Indiscriminate inlining bloats code, which kills cache performance.
Inlining focused on the critical path makes sense.
| |
| Pertti Kellomäki 2008-02-01, 10:42 pm |
| Tim Frink wrote:
> Based on that, they get one path through
> the program that is executed most frequently.
>
> I'm now wondering what sort of optimizations can be performed to that
> path.
A single instruction of a VLIW machine consists of a (fixed) number
of operations that the machine can execute in parallel. The operations
must be independent, i.e. one operation cannot use the output of
another as an input in the same cycle. This means that a VLIW machine
is potentially capable of impressive amounts of instruction level
parallelism, but only if the compiler is able to find it.
Typical basic blocks are relatively short, which limits the practical
amount of instruction level parallelism that the compiler can exploit.
So the trick that e.g. the Multiflow compiler used is to guess based
on the profiling information which basic blocks will usually be
executed in sequence, and do instruction scheduling (ordering of
instructions) on this larger region. Look up trace scheduling,
superblocks and hyperblocks for more information.
Since the IA-64 architecture can be thought of as VLIW-on-demand,
I suppose this is quite relevant there as well.
--
Pertti
| |
| M Wolfe 2008-02-04, 7:51 pm |
| A compiler may choose to prefer inlining functions along the hot path.
If the profile gives hints as to the loop counts, the compiler can
choose whether (or not) to vectorize or unroll a loop.
But even that's only looking at execution frequency profiles. Profile
feedback can include data as well. For instance, at an indirect
function call, the profile can save the function targets called, and
then the compiler can choose to inline one or more of the targets
(appropriately guarded). Or, at an integer divide, the profile may
save whether the divisor usually has the same value. If so, the
compiler can decide to use the reciprocal multiply method, where it
only has to recompute the reciprocal when the value changes.
| |
| Pasi Ojala 2008-02-10, 7:23 pm |
| On 2008-02-01, Pertti Kellomdki <pertti.kellomaki@tut.fi> wrote:
> So the trick that e.g. the Multiflow compiler used is to guess based
> on the profiling information which basic blocks will usually be
> executed in sequence, and do instruction scheduling (ordering of
> instructions) on this larger region.
We have a DSP with a 3-stage pipeline, and the backend optimizer
/parallelizer /scheduler can use profiling information for just
that. For each conditional jump either the fall-through or jump target
is preferred when moving instructions across basic blocks.
It does help, if you have representative test data.
The main problem is that as code modules get larger, generating
suitable profiling information gets proportionally harder.
-Pasi
|
|
|
|
|