Home > Archive > APL > November 2006 > Using FPGA's on AMD'S Opteron to accelerate servers 20--100 times
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Using FPGA's on AMD'S Opteron to accelerate servers 20--100 times
|
|
| aleph0 2006-10-23, 7:55 am |
| I'm starting a new thread as the old HW thread I started is getting
rather too large.
//
Celoxica accelerates AMD Opteron servers
by Steve Bush
Monday 23 October 2006
Oxfordshire-based Celoxica has released its FPGA-based accelerator
boards for AMD Opteron servers.
"We had to provide a product that is at least ten times faster than a
server that costs the same," Jeff Jussel,
v-p of marketing at the firm told EW. "We are seeing 20-200 times."
...
...
...
You take a piece of code that can be accelerated, usually a few hundred
lines of code, and replace it with an application programme interface
call to a new routine that is running in the FPGA," said Jussel.
...
...
[url]http://www.electronicsw ly.com/Articles/2006/10/23/39988/ Celoxica+accelerates+AMD+Opteron+servers
.htm[/url]
//
So they're doing what IMO could be easily done within the
APL-Interpreter !!
| |
| aleph0 2006-10-30, 7:01 pm |
| Just a follow up on what the CPU guys are doing FYI :
AMD, Intel come-back kids with X86 vectorisation
http://uk.theinquirer.net/?article=35342
//
A LONG time ago - but probably not in a galaxy far away - there were
many vector supercomputers, with extemely parallel floating point
engines, often executing more than 32 FP ops per CPU core per cycle.
...
...
...
Problems? An obvious one - the coprocessor model is long gone from the
PC programming world, so the ability to program an external coprocessor
as an extension of CPU instruction set needs to be promoted all over
again. You may have to rely on dedicated libraries to offload the work
from CPU to GPU instead of direct extension of CPU instruction set,
especially if the GPU doesn't share a common memory management model
with the CPU
...
...
...
In summary, whether for gaming physics, black hole research or
financial simulations, a combination of multi-core and vector
processing will bring PCs close to the teraflop performance, and most
probably cross the teraflop peak speed barrier by 2010
//
| |
| phil chastney 2006-10-30, 7:01 pm |
| aleph0 wrote:
> Just a follow up on what the CPU guys are doing FYI :
>
> AMD, Intel come-back kids with X86 vectorisation
> http://uk.theinquirer.net/?article=35342
>
> //
> A LONG time ago - but probably not in a galaxy far away - there were
> many vector supercomputers, with extemely parallel floating point
> engines, often executing more than 32 FP ops per CPU core per cycle.
> ..
> ..
> ..
> Problems? An obvious one - the coprocessor model is long gone from the
> PC programming world, so the ability to program an external coprocessor
> as an extension of CPU instruction set needs to be promoted all over
> again. You may have to rely on dedicated libraries to offload the work
> from CPU to GPU instead of direct extension of CPU instruction set,
> especially if the GPU doesn't share a common memory management model
> with the CPU
> ..
> ..
> ..
> In summary, whether for gaming physics, black hole research or
> financial simulations, a combination of multi-core and vector
> processing will bring PCs close to the teraflop performance, and most
> probably cross the teraflop peak speed barrier by 2010
> //
>
isn't the value of the GPU limited rather by the fact that most of them
use saturated arithmetic? . . . /phil
| |
| aleph0 2006-10-30, 7:01 pm |
| > isn't the value of the GPU limited rather by the fact that most of them
> use saturated arithmetic?
No !
ATI for example recently released a software tool to allow people to
directly use the engine of their GPU for Video encoding/decoding for
certain graphic card.
Doing this job normally takes even a 4GHz PC anything from 1-12 hours -
depending on the codecs being used.
Using ATI's GPU engine + s/w tools, it's only a matter of minutes !!!
GPUs already do somewhere near 350 GFLOPS (32-bit).
It shouldn't take much to make the FP ops 64/80 bit even if it means
only 175 GFLOPs .. which is probably a worst case of course !
ClearSpeed already supplies such FP-add-on cards for Opteron
server-systems AFAIK.
They currently cost ca. 15,000$ a piece .. but say on mass-producing
them, the cost will of course come down!
| |
|
|
| aleph0 2006-10-30, 7:01 pm |
| aleph0 wrote:
> Doing this job normally takes even a 4GHz PC anything from 1-12 hours -
> depending on the codecs being used.
> Using ATI's GPU engine + s/w tools, it's only a matter of minutes !!!
David, I'll answer your question here :
4 GHz .. in a nominal sense of course !
e.g. AMD Athlon64 5200+ is a measurement versus an AMD Athlon (K7) with
1 GHz .
The "real" 4 GHz threshold has only been broken via the so called
overclockers .. using water- ing and/or other methods ! Many people
bought the much cheaper Opteron 140 series as it was found to overclock
extremely well without any major heat problems.
The old "MHz" myth that ignored things like IPC and FSB etc., has now
been replaced with a much more honest approach ... something which AMD
has been promoting for some years now.
e.g. What use is a 4GHZ processor when it has to wait "years" for
memory access !
The AMD Hypertransort ( and coherent Hypertransport ) architectures
combined with AMD's on-chip memory controller eliminate latency
between CPU and memory.
It also makes for the "scaling" of multiple CPUs much more linear.
i.e. if you have 8 OPterons, it's going to be "nearly" as fast as the
sum of 2 x 4 Opterons.
| |
| aleph0 2006-10-30, 7:01 pm |
| Another follow up :
The Games People Play
http://www.hpcwire.com/hpc/1003475.html
//
...
...
A 500-gigaflop processor is apparently already in the works.
...
...
Supercomputing users are envisioning CPU-GPU hybrid systems offering
boatloads of vector performance at a reasonable price. Vendors like
PANTA Systems are busy introducing such systems. PANTA's new platform,
which can combine traditional Opteron modules with NVIDIA GPU modules,
is profiled in this w 's issue of HPCwire. On the software side,
startups like PeakStream and RapidMind are building development
platforms for general-purpose computing with GPUs (GPGPU), giving
application programmers access to the raw computing power on these
devices.
...
...
AMD seems to be out in front of the GPGPU wave. Whatever the company's
plans are for the ATI devices, one can assume it involves making these
devices even more commonplace across computing platforms. Whether they
migrate from external devices to co-processors to on-chip cores remains
to be seen. But however it plays out, AMD's plans to bring the GPU to
mainstream computing is certainly a bold move.
...
...
//
| |
| phil chastney 2006-10-30, 7:01 pm |
| aleph0 wrote:
>
>
> No !
> ATI for example recently released a software tool to allow people to
> directly use the engine of their GPU for Video encoding/decoding for
> certain graphic card.
> Doing this job normally takes even a 4GHz PC anything from 1-12 hours -
> depending on the codecs being used.
> Using ATI's GPU engine + s/w tools, it's only a matter of minutes !!!
have I understood this correctly: you're offloading a video processing
task from the CPU to the GPU -- is that right?
but isn't that still using saturated arithmetic? this isn't quite the
same thing as utilising the power of the GPU for financial analysis, is it?
> GPUs already do somewhere near 350 GFLOPS (32-bit).
> It shouldn't take much to make the FP ops 64/80 bit even if it means
> only 175 GFLOPs .. which is probably a worst case of course !
>
> ClearSpeed already supplies such FP-add-on cards for Opteron
> server-systems AFAIK.
> They currently cost ca. 15,000$ a piece .. but say on mass-producing
> them, the cost will of course come down!
ClearSpeed is a different kettle of fish, because that does "proper"
floating point arithmetic (IEEE?), and I would be more than happy to
play with one of those things for a while . . . /phil
| |
| aleph0 2006-10-30, 7:01 pm |
| Another follow up to things coming :
AMD Intros ATI Vista Ready Chips
Oct 28, 2006
http://www.techtree.com/India/News/...-76790-581.html
//
...
We could not have achieved this without our partnership with ATI. From
day one, ATI has played a key role in helping us design and validate
the new driver model at the heart of Windows Vista, and ATI has since
developed robust and performant drivers that highlight the capabilities
of our new operating system.
...
...
In addition, ATI will be working in tandem with several leading
companies and academic institutions towards developing a new technology
called "Stream Computing" that the company says possesses the potential
to impact each and every sector of the market.
//
| |
| aleph0 2006-11-14, 6:56 pm |
| Update :
Cray's Jan Silverman Discusses New HPC Offerings
http://www.hpcwire.com/hpc/1086098.html
//
HPCwire: What technical innovations are in the XMT?
Silverman: The Cray XMT platform uses a new custom multithreading
processor chip we developed that we call Threadstorm. With AMD's
Torrenza socket initiative, we are able to design and then place these
new processor chips into the Opteron sockets in the Cray XT3/XT4
compute blades. Because the rest of the system is based on the Cray XT
infrastructure, the XMT automatically inherits a scalable system
architecture. The XMT scales from 24 to more than 8,000 sockets,
yielding over one million threads that can address 128 terabytes of
shared memory.
HPCwire: What about the programming environment?
Silverman: The Cray XMT has a programming environment focused on
developing multi-thread applications that includes advanced analysis
tools for software development and tuning. There are C and C++
compilers with automatic parallelization, STL and common math
libraries, a cross compiler on Linux processors, gcc/g++ on Linux
nodes, and Cray Apprentice2 for compiler analysis and performance
visualization.
//
|
|
|
|
|