Code Comments
Programming Forum and web based access to our favorite programming groups.Recently I've been reading about General-Purpose Computation Using Graphics hardware - http://www.gpgpu.org - and it seems that GPUs can bring quite a good performance when compared to the CPUs. In other words, a graphics chip on the graphics card can make really heavy computations, and it's easier and cheaper to buy a couple of top-performance graphisc cards than to buy a multi-CPU machine (which are quite expensive). Do you think - theoretically - that a compiler could help compiling software, which would in turn use the power of the GPU to make some of the computations? Like now we have compiler options like "-mmmx -msse -msse2 -msse3 -m3dnow" - would it be possible to optimize the code of the binary to use the GPU with "-with-nvidia-gpu" or "-with-ati-gpu"? I would like to hear some theoretical discussion about that. -- Tomek
Post Follow-up to this messageTomasz Chmielewski wrote: > Recently I've been reading about General-Purpose Computation Using > Graphics hardware - http://www.gpgpu.org - and it seems that GPUs can > bring quite a good performance when compared to the CPUs. > > In other words, a graphics chip on the graphics card can make really > heavy computations, and it's easier and cheaper to buy a couple of > top-performance graphisc cards than to buy a multi-CPU machine (which > are quite expensive). Well, most of modern CPUs (and GPUs) are multiunit processors, i.e. similar to a multi-CPU machine, and every modern compiler takes this into account. Second, from the point of view of performance it's more cost-effective to buy a lot of low-cost graphic cards or CPUs than a couple of top-performance ones. The only problems are availability and price of very fast buses and networks, and good and efficient parallel programming languages. On most of current PCs, the fastest bus is AGP or PCI Express, but it usually allows only one graphic card to be attached. This means that at this time the only cheap solution is to use a couple of PCI graphic cards, and only one GPU connected to AGP or PCIExpress. Motherboards with multiple AGP/PCIex buses are much more expensive. The second problem is that GPUs are heavily oriented towards floating point computations with small precision which is needed for 3D. This can help with solving differential equations, weather prediction etc., but it will be much more difficult to compute discrete algorithms on these processors. The third problem is that most of graphic card manufacturers don't allow you to bypass their drivers and download code that can run on a GPU. This presents a security problem for the system, and also compromises the graphic performance of the cards. > Do you think - theoretically - that a compiler could help compiling > software, which would in turn use the power of the GPU to make some > of the computations? > > Like now we have compiler options like "-mmmx -msse -msse2 -msse3 > -m3dnow" - would it be possible to optimize the code of the binary to > use the GPU with "-with-nvidia-gpu" or "-with-ati-gpu"? > > I would like to hear some theoretical discussion about that. This is called "cross-compilation" and this is possible on most compilers. You'll also need the parts of code that download the code into the card, allow activation of the code from your CPU, and get the results from the card. This is doable, and I think we had a discussion on this issue a couple of years ago. The OEM manufacturers of Nvidia or ATI cards can definitely include this into their drivers, together with some "sandbox" system to prevent malfunctioning of the card. The only support from OS that you need is the possibility to allocate some video buffers to your process, and to read/write them without interferencing too much with the card. Unfortunately, it's worthwile to run only sufficiently large parts of the code on a GPU because of the price of communication with the board. MMX/SSE allow much faster communication, and the interface is included as a part of instruction set. Using GP is similar to running a parallel algorithm on an MP machine with some of the memory shared between the processors, and the reason is that the video buses are built for better memory transfer and not message passing. MPs with shared memory is a huge area with hundreds research papers. In your case you'll have only two heavily connected processors. Michael
Post Follow-up to this messageTomasz Chmielewski wrote: > In other words, a graphics chip on the graphics card can make really > heavy computations, and it's easier and cheaper to buy a couple of > top-performance graphisc cards than to buy a multi-CPU machine (which > are quite expensive). It depends on what tasks you plan to solve and how much resources you have to develop the system. > The second problem is that GPUs are heavily oriented towards >floating point computations with small precision which is needed for >3D. This can help with solving differential equations, weather >prediction etc., but it will be much more difficult to compute >discrete algorithms on these processors. From the compiler's point of view you'll spend much time to make an effective compiler. Look at the DSPs market - do you know many C compilers which generate instructions for DSP's MAC-unit? which generate MAC instead of MUL + ADD? i find 0 compilers, which do it (even for such a DSPs like motorola 56K, TI etc.). Besides - can smb. point me to the compilers which do it? From the other side the problem looks very interesting and contains many problems, which can be researched. Best Regards, Oleg
Post Follow-up to this messageI've been dabbling around in this area and I would emphatically say "No". You're not going to see those options show up any time soon. What is GPGPU computing? It's essentially hijacking the rendering pipeline at the shading/texture mapping phase and rendering back to a texture. A texture is a 2D array - so you're running a program on one 2D array and outputting to another 2D array. 3D is a stack of 2D arrays. There are two reasons why you're not going to see these options show up. First, the instruction set is __very__ specialized to support common graphic operations, like dot products. Not too hard to recognize, but probably want to take the instrinsic or builtin approach to implementing certain operations. This is why nVidia built the Cg language for implementing algorithms. Second, GPUs have a very specialized serial memory access pattern, making array operations on random elements far more challenging. So, if the algorithm in question doesn't follow the memory access pattern the GPU is expecting, your compiler will have to coerce or map the code it's got to the GPU target. This is one reason why it's hard to do things like Runge-Kutta on GPUs whereas implicit numerical methods are preferred. Hope this helps.
Post Follow-up to this messageTomasz Chmielewski wrote: > Like now we have compiler options like "-mmmx -msse -msse2 -msse3 > -m3dnow" - would it be possible to optimize the code of the binary to > use the GPU with "-with-nvidia-gpu" or "-with-ati-gpu"? Modern GPU's are SIMD processors that execute the same program on multiple data-elements (e.g. vertices or pixels) in parallel. Restructuring compilers are able to extract such parallelism from loop nests by dependence analysis (determining which loop interations are independent) and loop transformations (re-ordering iterations while preserving the program operation). In addition, you would need to partition the code between CPU and GPU and handle communication of data between the two.
Post Follow-up to this messageHello! Oleg V.Boguslavsky <schummi@i.com.ua> wrote: >[...] >From the compiler's point of view you'll spend much time to make an >effective compiler. Look at the DSPs market - do you know many C >compilers which generate instructions for DSP's MAC-unit? which >generate MAC instead of MUL + ADD? i find 0 compilers, which do it >(even for such a DSPs like motorola 56K, TI etc.). Besides - can smb. >point me to the compilers which do it? No actual compiler, but it was the simplest part of my diploma thesis on some aspects of Code Generation for DSPs to do exactly that. At least it is a simple task if you use code generator generators which can match subtree (or sub-DAG) patterns which are more complex than operator(operand1,...,operandn). For people who can read German, the thesis is available on the pages of the university department where I did it: http://www.info.uni-karlsruhe.de/pu...ions.php/id=353 >[...] Kind regards, Hannah.
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.