For Programmers: Free Programming Magazines  


Home > Archive > Compilers > May 2005 > GPU-aware compiling?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author GPU-aware compiling?
Tomasz Chmielewski

2005-05-20, 8:58 pm

Recently I've been reading about General-Purpose Computation Using
Graphics hardware - http://www.gpgpu.org - and it seems that GPUs can
bring quite a good performance when compared to the CPUs.

In other words, a graphics chip on the graphics card can make really
heavy computations, and it's easier and cheaper to buy a couple of
top-performance graphisc cards than to buy a multi-CPU machine (which
are quite expensive).

Do you think - theoretically - that a compiler could help compiling
software, which would in turn use the power of the GPU to make some of
the computations?

Like now we have compiler options like "-mmmx -msse -msse2 -msse3
-m3dnow" - would it be possible to optimize the code of the binary to
use the GPU with "-with-nvidia-gpu" or "-with-ati-gpu"?

I would like to hear some theoretical discussion about that.

--
Tomek
Michael Tiomkin

2005-05-22, 3:59 am

Tomasz Chmielewski wrote:
> Recently I've been reading about General-Purpose Computation Using
> Graphics hardware - http://www.gpgpu.org - and it seems that GPUs can
> bring quite a good performance when compared to the CPUs.
>
> In other words, a graphics chip on the graphics card can make really
> heavy computations, and it's easier and cheaper to buy a couple of
> top-performance graphisc cards than to buy a multi-CPU machine (which
> are quite expensive).


Well, most of modern CPUs (and GPUs) are multiunit processors, i.e.
similar to a multi-CPU machine, and every modern compiler takes this
into account. Second, from the point of view of performance it's more
cost-effective to buy a lot of low-cost graphic cards or CPUs than a
couple of top-performance ones. The only problems are availability and
price of very fast buses and networks, and good and efficient parallel
programming languages.

On most of current PCs, the fastest bus is AGP or PCI Express, but
it usually allows only one graphic card to be attached. This means
that at this time the only cheap solution is to use a couple of PCI
graphic cards, and only one GPU connected to AGP or
PCIExpress. Motherboards with multiple AGP/PCIex buses are much more
expensive.

The second problem is that GPUs are heavily oriented towards
floating point computations with small precision which is needed for
3D. This can help with solving differential equations, weather
prediction etc., but it will be much more difficult to compute
discrete algorithms on these processors.

The third problem is that most of graphic card manufacturers don't
allow you to bypass their drivers and download code that can run on a
GPU. This presents a security problem for the system, and also
compromises the graphic performance of the cards.

> Do you think - theoretically - that a compiler could help compiling
> software, which would in turn use the power of the GPU to make some
> of the computations?
>
> Like now we have compiler options like "-mmmx -msse -msse2 -msse3
> -m3dnow" - would it be possible to optimize the code of the binary to
> use the GPU with "-with-nvidia-gpu" or "-with-ati-gpu"?
>
> I would like to hear some theoretical discussion about that.


This is called "cross-compilation" and this is possible on most
compilers. You'll also need the parts of code that download the code
into the card, allow activation of the code from your CPU, and get the
results from the card. This is doable, and I think we had a discussion
on this issue a couple of years ago. The OEM manufacturers of Nvidia
or ATI cards can definitely include this into their drivers, together
with some "sandbox" system to prevent malfunctioning of the card.

The only support from OS that you need is the possibility to allocate
some video buffers to your process, and to read/write them without
interferencing too much with the card.

Unfortunately, it's worthwile to run only sufficiently large parts
of the code on a GPU because of the price of communication with the
board. MMX/SSE allow much faster communication, and the interface is
included as a part of instruction set.

Using GP is similar to running a parallel algorithm on an MP machine
with some of the memory shared between the processors, and the reason
is that the video buses are built for better memory transfer and not
message passing. MPs with shared memory is a huge area with hundreds
research papers. In your case you'll have only two heavily connected
processors.

Michael
Oleg V.Boguslavsky

2005-05-22, 8:56 pm

Tomasz Chmielewski wrote:
> In other words, a graphics chip on the graphics card can make really
> heavy computations, and it's easier and cheaper to buy a couple of
> top-performance graphisc cards than to buy a multi-CPU machine (which
> are quite expensive).


It depends on what tasks you plan to solve and how much resources you
have to develop the system.

> The second problem is that GPUs are heavily oriented towards
>floating point computations with small precision which is needed for
>3D. This can help with solving differential equations, weather
>prediction etc., but it will be much more difficult to compute
>discrete algorithms on these processors.


From the compiler's point of view you'll spend much time to make an
effective compiler. Look at the DSPs market - do you know many C
compilers which generate instructions for DSP's MAC-unit? which
generate MAC instead of MUL + ADD? i find 0 compilers, which do it
(even for such a DSPs like motorola 56K, TI etc.). Besides - can smb.
point me to the compilers which do it?

From the other side the problem looks very interesting and contains many
problems, which can be researched.

Best Regards,

Oleg
scooter.phd@gmail.com

2005-05-24, 4:02 pm

I've been dabbling around in this area and I would emphatically say
"No". You're not going to see those options show up any time soon.

What is GPGPU computing? It's essentially hijacking the rendering
pipeline at the shading/texture mapping phase and rendering back to a
texture. A texture is a 2D array - so you're running a program on one
2D array and outputting to another 2D array. 3D is a stack of 2D
arrays.

There are two reasons why you're not going to see these options show
up. First, the instruction set is __very__ specialized to support
common graphic operations, like dot products. Not too hard to
recognize, but probably want to take the instrinsic or builtin approach
to implementing certain operations. This is why nVidia built the Cg
language for implementing algorithms. Second, GPUs have a very
specialized serial memory access pattern, making array operations on
random elements far more challenging. So, if the algorithm in question
doesn't follow the memory access pattern the GPU is expecting, your
compiler will have to coerce or map the code it's got to the GPU
target. This is one reason why it's hard to do things like Runge-Kutta
on GPUs whereas implicit numerical methods are preferred.

Hope this helps.
Rob Dimond

2005-05-24, 4:02 pm

Tomasz Chmielewski wrote:

> Like now we have compiler options like "-mmmx -msse -msse2 -msse3 >

-m3dnow" - would it be possible to optimize the code of the binary to
> use the GPU with "-with-nvidia-gpu" or "-with-ati-gpu"?


Modern GPU's are SIMD processors that execute the same program on
multiple data-elements (e.g. vertices or pixels) in parallel.

Restructuring compilers are able to extract such parallelism from loop
nests by dependence analysis (determining which loop interations are
independent) and loop transformations (re-ordering iterations while
preserving the program operation). In addition, you would need to
partition the code between CPU and GPU and handle communication of
data between the two.
Hannah Schroeter

2005-05-24, 4:02 pm


Hello!

Oleg V.Boguslavsky <schummi@i.com.ua> wrote:
>[...]


>From the compiler's point of view you'll spend much time to make an
>effective compiler. Look at the DSPs market - do you know many C
>compilers which generate instructions for DSP's MAC-unit? which
>generate MAC instead of MUL + ADD? i find 0 compilers, which do it
>(even for such a DSPs like motorola 56K, TI etc.). Besides - can smb.
>point me to the compilers which do it?


No actual compiler, but it was the simplest part of my diploma thesis
on some aspects of Code Generation for DSPs to do exactly that.

At least it is a simple task if you use code generator generators
which can match subtree (or sub-DAG) patterns which are more complex
than operator(operand1,...,operandn).

For people who can read German, the thesis is available on the pages
of the university department where I did it:

http://www.info.uni-karlsruhe.de/pu...ions.php/id=353

>[...]


Kind regards,

Hannah.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com