For Programmers: Free Programming Magazines  


Home > Archive > Compilers > August 2004 > Register allocation









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Register allocation
Abhijit Ray

2004-07-15, 9:01 pm

Are there any published records of how much improvement ( both in code
size and run time ) "register allocation" provides?

I am trying to get the performance estimates of a C program. But at
the moment I am ignoring the act that the number of registers are
limited. I am assuming that the processor has infinite registers. What
I would like to know is what range of error I might run into under
such assumption.

If there are published reports all the better.

Thanks,
Abhijit.
Gopi Bulusu

2004-07-28, 9:08 pm

> I am trying to get the performance estimates of a C program. But at
> the moment I am ignoring the act that the number of registers are
> limited. I am assuming that the processor has infinite registers. What
> I would like to know is what range of error I might run into under
> such assumption.


Let's assume that the processor has only 1-register. Then every use of
a virtual register in your "infinite register set" will need spill
code to be generated which is an overhead of approximately 1-store and
1-load for every register access. That's the theoretical worst case
(actual can be worse. I spent a total of only 30-seconds on this one,
so any corrections are welcome :-)

Btw, among the popular compilers, gcc is an example of a compiler that
makes such an assumption. The reload phase of gcc essentially takes
care of mapping a theoretical infinite register set to the actual
finite register set of a particular processor.

Unless you are dealing with a trivial C program, number of registers
is probably not the most important thing that impacts the error range
of your estimate, a good optimizer (which will include a good global
register allocator) will probably impact it a lot more.

Regards,
gopi

---

Gopi Kumar Bulusu
Sankhya Technologies Private Limited
http://www.sankhya.com
Tel: +91 891 554 2666
Fax: +91 891 554 2665
Rajaram

2004-08-04, 3:57 am

Hi

> Are there any published records of how much improvement ( both in code
> size and run time ) "register allocation" provides?


I think there are many . Anyway checkout

"Register Allocation by Priority-based Coloring" Frederick Chow t and John
Hennessy

It talks about register allocation statistics for benchmark programs(Section
7.1).

Anyway did u google arround !! ;)

RERA

Kamal R. Prasad

2004-08-05, 3:59 pm

gopi@sankhya.com (Gopi Bulusu) wrote in message news:<04-07-074@comp.compilers>...
>
> Let's assume that the processor has only 1-register. Then every use of
> a virtual register in your "infinite register set" will need spill
> code to be generated which is an overhead of approximately 1-store and
> 1-load for every register access. That's the theoretical worst case
> (actual can be worse. I spent a total of only 30-seconds on this one,
> so any corrections are welcome :-)
>


The overhead is 1-load and 1-store, but the overhead isn't as high as
you would expect, thanks to a hierarchy of caches to speed things up.
No doubt the speed with which a register can be accessed is much
higher than a first-level cache access, but the performance difference
between a good and bad allocation strategy may not amount to a lot
[and that is a guess].

regards
-kamal
russell kym horsell

2004-08-09, 3:56 am

Kamal R. Pra <kamalp@acm.org> wrote:
[...]
> The overhead is 1-load and 1-store, but the overhead isn't as high as
> you would expect, thanks to a hierarchy of caches to speed things up.
> No doubt the speed with which a register can be accessed is much

[...]

Measure it, and you'll get a shock. On 5 yo architecture there seems
to be virtually no diff between memory access and registers for common
stuff. That and the h/w scheduling of instrs makes efficient code gen
a bit easier. ;-)
Kamal R. Prasad

2004-08-09, 3:56 am

"Rajaram" <rajaram@acmet.com> wrote
>
> I think there are many . Anyway checkout
>
> "Register Allocation by Priority-based Coloring" Frederick Chow t and John
> Hennessy
>
> It talks about register allocation statistics for benchmark programs(Section
> 7.1).


The original poster wanted one that does NOT use costing. Im not sure
why that (costing) should be a problem -but just to let you know that
"Register allocation by priority based coloring" by Chow Hennessy (ACM
Transaction of Programming Languages and Systems, Vol 12, No 4, Oct
1990) does use some sort of costing to find out which variable to
assign to a register. This algorithm is an improvement over the one
proposed by Chaitin (ref. Chaitin G.J, Register allocation and
spilling via graph coloring In Proceedings of the ACM SIGPLAN 82,
Symposium on Compiler Construction pp 98-105). Chaitin performs
optimization over machine code, whereas Hennessy relies on
intermediate code which the original poster intends to use. Both
Chaitin and Hennessy perform an intra-procedural analysis only. There
are quite a few other papers like:-

-Register allocation across procedure and module boundaries by Vatsa
Santhanam, Daryl Odnert in ACM SIGPLAN notices, June 1990 Vol 25 No 6
-A generalized scheme for graph coloring and register allocation by
Michael D Smith, Normal Ramsey and Glenn Holloway in ACM SIGPLAN
notices, Vol 39, No 6
-some other papers detailing techniques for parallel, embedded
architectures etc.. + optimizations for multi-threaded applications,
which Im not sure are of any interest to the original poster.

regards
-kamal
Gopi Bulusu

2004-08-10, 9:03 pm

kamalp@acm.org (Kamal R. Pra) wrote

> No doubt the speed with which a register can be accessed is much
> higher than a first-level cache access, but the performance difference
> between a good and bad allocation strategy may not amount to a lot
> [and that is a guess].


In reality, It can be very significant on most processors. Reminds me
of one of those (Amdahl's ?) laws which states that it is good to have
either 0, 1 or infinite number of a particular resource (like a
register). The difference between good and bad register allocation can
have significant impact on performance. No wonder, register allocation
tends to be the "buggiest and most difficult to maintain" part of
almost any code generator/optimizer !

Regards,
gopi

---

Gopi Kumar Bulusu
Sankhya Technologies Private Limited
http://www.sankhya.com
Tel: +91 891 554 2666
Fax: +91 891 554 2665
Anton Ertl

2004-08-10, 9:03 pm

kamalp@acm.org (Kamal R. Pra) writes:
>"Register allocation by priority based coloring" by Chow Hennessy (ACM
>Transaction of Programming Languages and Systems, Vol 12, No 4, Oct
>1990) does use some sort of costing to find out which variable to
>assign to a register. This algorithm is an improvement over the one
>proposed by Chaitin (ref. Chaitin G.J, Register allocation and
>spilling via graph coloring In Proceedings of the ACM SIGPLAN 82,
>Symposium on Compiler Construction pp 98-105).


Not really. If you take Chow's technique, remove the live range
splitting, and add a more sophisticated technique for determining the
order of colouring, you arrive at Briggs' technique. Now, instead of
making the spilling decision during colouring, make it when
determining the order of colouring (based on worst-case assumptions),
and you arrive at Chaitin. Since Chow uses less sophisticated
ordering heuristics than Chaitin, I would not call his technique an
improvement over Chaitin's (nor the other way round).

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
Anton Ertl

2004-08-10, 9:03 pm

russell kym horsell <kym@sdf.lonestar.org> writes:
>Kamal R. Pra <kamalp@acm.org> wrote:
>[...]
>[...]
>
>Measure it, and you'll get a shock. On 5 yo architecture there seems
>to be virtually no diff between memory access and registers for common
>stuff.


It depends very much on the microarchitecture. On most of them
register allocation pays off very well. For some older results of a
micro-benchmark see
http://www.google.at/groups?q=g:thl...ws.tuwien.ac.at

For some more recent results for an application, see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8092#c6

Executive summary of the latter: For one application (Gforth), using
explicit register allocation in addition to the gcc -O2 register
allocator gives speedups by a factor of 2-5 on an Athlon 1200 (and
similar on a Pentium III) and does not have a big effect on the
Pentium 4.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/home.html
kym@kymhorsell.com

2004-08-11, 8:58 pm

> russell kym horsell <kym@sdf.lonestar.org> writes:
> It depends very much on the microarchitecture. On most of them
> register allocation pays off very well. For some older results of a

[...]

I also have some benchmarks for P4s, XP's and some older things using
using SSE1, SSE2, 3Dnow and the old x87 FP units (you'll note
a predelection toward fp work), with 5 different instruction scheduling
stategies, with or without bb-wide register variables. The diff in
speed for well-scheduled code of loop-wide reg vars seems to be only a few
percent.

A much larger impact than whether to put something in a reg is getting rid
of near machine-wide pipe bubbles or stalls by allocating
the right instr packet for the right time-slot, and improving cache and
TLB hits (and even victim cache hits, where applicable) with appropriate
unrolling and blocking.

GCC is a great gen purp re-targetable compiler, but is not really
a high-performance platform.

It seems even basic vectorising/pipeline/vliw compilers generate code
that can run an order of magnitude faster than gcc's code for many
modern desktops.

I have some benchmarks where a a basic home grown F90 compiler generates
unassisted code running 50x than even moderately pre- tinkered C code
using gcc -fwhatever -mwhatever -O[0-8] on the same platform.

Some PD "tarpit" results are available somewhere under
http://junk.kymhorsell.com

----------------------------------------
kym@kym.massbus.org
SDF Public Access UNIX System - http://sdf.lonestar.org
Kamal R. Prasad

2004-08-13, 8:57 pm

anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote in message news:<04-08-050@comp.compilers>...
> russell kym horsell <kym@sdf.lonestar.org> writes:
[snip][color=darkred]
>
> Executive summary of the latter: For one application (Gforth), using
> explicit register allocation in addition to the gcc -O2 register
> allocator gives speedups by a factor of 2-5 on an Athlon 1200 (and
> similar on a Pentium III) and does not have a big effect on the
> Pentium 4.
>
> - anton


There is also another feature i.e hardware multi-threading which helps
overcome latencies arising from a bad allocation strategy. The POWER 4
processor uses this and some other means of extracting parallelism to
ensure that unoptimized code does run efficiently on it.

regards
-kamal
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com