For Programmers: Free Programming Magazines  


Home > Archive > Fortran > November 2007 > Parallelization on Dell Workstations









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Parallelization on Dell Workstations
Ganesh

2007-11-27, 4:29 am

Hi !

We have recently acquired 3 Dell 690 workstations (2 Double Dual
cores and 1 Double Quad core)

http://www.dell.com/content/product...=555&l=en&s=biz

What would be the ideal method to parallelize code on these
machines..! MPI or OPEN MP or a combination of both ?

We're using Fedora Core 6 on these machines.. ! Which is the easiest
of the MPI implementations to install ??

Ganesh V
linuxl4@sohu.com

2007-11-27, 8:09 am

download MPICH ( from http://www-unix.mcs.anl.gov/mpi/mpich/ ) or
openmpi (from http://www.open-mpi.org/ ) , then compile.

another way is give me the ssh account of your dell workstations ,I
will do it well for you within 1 hour. :)
Sebastian Hanigk

2007-11-27, 7:18 pm

Ganesh <ganesh.iitm@gmail.com> writes:

> We have recently acquired 3 Dell 690 workstations (2 Double Dual
> cores and 1 Double Quad core)
>
> What would be the ideal method to parallelize code on these
> machines..! MPI or OPEN MP or a combination of both ?


For SMP machines, OpenMP is usually more efficient because you use the
simplest and fastest communication scheme of all: shared memory. Also,
the effort to parallelise existing code with OpenMP is much smaller than
the respective MPI work.

On the other hand, if you're planning to work on supercomputers anyway,
MPI will be more or less a conditio sine qua non (most systems now are
massive-parallel clusters). As for the MPI implementation: try OpenMPI
(<http://www.open-mpi.org/> ) and MPICH2
(<http://www.mcs.anl.gov/research/projects/mpich2/> ).

A hybrid parallelisation model (MPI between nodes, OpenMP inside a node)
can always be retrofitted onto an existing MPI code (beware of
reentrancy!), but I would suggest to concentrate on a working MPI
solution first.


Sebastian
Tim Prince

2007-11-27, 7:18 pm

Ganesh wrote:
> Hi !
>
> We have recently acquired 3 Dell 690 workstations (2 Double Dual
> cores and 1 Double Quad core)
>
> http://www.dell.com/content/product...=555&l=en&s=biz
>
> What would be the ideal method to parallelize code on these
> machines..! MPI or OPEN MP or a combination of both ?
>
> We're using Fedora Core 6 on these machines.. ! Which is the easiest
> of the MPI implementations to install ??


You are leaving this wide open to opinions. Certainly, the choice with
which an individual has experience will be easiest.
Depending on the characteristics of your applications, Intel Cluster
OpenMP may be good. If so, it should be easier than MPI.
Intel or HP-MPI will install even more easily than open source MPI.
Both have short term free evaluation.
It will be difficult to take advantage of the quad core when running as
a cluster along with dual core machines.
Plain OpenMP, running separate jobs on each node, might be the easiest
way to start, and to evaluate dual vs quad core performance.
There is a lot of argument about whether hybrid MPI with OpenMP or MPI
alone is ideal for such machines. The optimum, again, depends on
factors which you haven't divulged.
Sebastian Hanigk

2007-11-27, 7:18 pm

Tim Prince <timothyprince@sbcglobal.net> writes:

> There is a lot of argument about whether hybrid MPI with OpenMP or MPI
> alone is ideal for such machines. The optimum, again, depends on
> factors which you haven't divulged.


Implicit hybrid programming has been useful on Europe's currently
largest supercomputer, the Mare Nostrum, because the node interconnect
consists of only one Myrinet interface - which caused bandwidth problems
in our code if four MPI processes were to run on each node. The usage of
the multithreaded LAPACK-library with only one MPI-process per node was
in that case faster (which makes it quite obvious that sheer FLOPS count
is worthless without good interconnects).


Sebastian
Tim Prince

2007-11-27, 10:11 pm

Sebastian Hanigk wrote:
> Tim Prince <timothyprince@sbcglobal.net> writes:
>
>
> Implicit hybrid programming has been useful on Europe's currently
> largest supercomputer, the Mare Nostrum, because the node interconnect
> consists of only one Myrinet interface - which caused bandwidth problems
> in our code if four MPI processes were to run on each node. The usage of
> the multithreaded LAPACK-library with only one MPI-process per node was
> in that case faster (which makes it quite obvious that sheer FLOPS count
> is worthless without good interconnects).

If the processes on a single node send messages among themselves on
shared memory, and don't involve more messages to other nodes than in a
hybrid model, there is nothing to be gained in savings on communication.
A random mapping of processes to nodes, evidently, will saturate the
intra-node connection quickly.
We face similar issues in attempting to get good performance with 8-core
nodes, when performance per core is not as high as in 4-core nodes,
additional message traffic among nodes will limit cluster performance at
a smaller number of nodes.
Ganesh

2007-11-28, 4:26 am

hi !


On Nov 28, 7:28 am, Tim Prince <timothypri...@sbcglobal.net> wrote:[color=darkred]
> Sebastian Hanigk wrote:
>

Ok.. we are not planning to connect these machines together.. ! The
codes are going to be run on individual comps only !


However.. I have had trouble understanding the architecture of these
computers. Take for e.g a Dual core machine (one that is commonly
available for desktops these days). Is the memory shared between these
cores ? How about the cache ?

Going further..! What happens in these high end workstations ? Take
for instance the Double Dual core, and the double Quad core.. ! The
link has i sent has all the info. However I am not able to comprehend
them fully.

I have an MPI code working. Setting up OpenMP code is not that big a
deal I guess. A combo would involve some effort as well. I am ready to
put in the effort needed. However I need to know which one to put my
effort on ?

ganesh
Sebastian Hanigk

2007-11-28, 4:26 am

Tim Prince <timothyprince@sbcglobal.net> writes:

> If the processes on a single node send messages among themselves on
> shared memory, and don't involve more messages to other nodes than in a
> hybrid model, there is nothing to be gained in savings on
> communication.


Of course. The problem was among other things that most of the
communication was not intra-node traffic.

> We face similar issues in attempting to get good performance with 8-core
> nodes, when performance per core is not as high as in 4-core nodes,
> additional message traffic among nodes will limit cluster performance at
> a smaller number of nodes.


Contention of the CPU-memory link, I presume?


Sebastian
Sebastian Hanigk

2007-11-28, 4:26 am

Ganesh <ganesh.iitm@gmail.com> writes:

> Ok.. we are not planning to connect these machines together.. ! The
> codes are going to be run on individual comps only !


So we're talking only about SMP parallelisation.

> However.. I have had trouble understanding the architecture of these
> computers. Take for e.g a Dual core machine (one that is commonly
> available for desktops these days). Is the memory shared between these
> cores ? How about the cache ?


The main memory (RAM) is shared among all cores. The cache sharing
depends on the architecture, Intel's Core2 Duo has per-core L1 caches,
but a shared L2 cache while the AMD dual-cores have separate cache
hierarchies.

> Going further..! What happens in these high end workstations ? Take
> for instance the Double Dual core, and the double Quad core.. ! The
> link has i sent has all the info. However I am not able to comprehend
> them fully.


What do you mean by »happens«? In the case of a double quad-core
machine, you would have two processing modules with four cores each; for
most of your work you treat them like a simple 8-way SMP system.

> I have an MPI code working. Setting up OpenMP code is not that big a
> deal I guess. A combo would involve some effort as well. I am ready to
> put in the effort needed. However I need to know which one to put my
> effort on ?


Good MPI implementations should realise intra-node communication via
shared memory, so you pay a bit of lost efficiency due to the messaging
overhead. For the moment, I would suggest to install an MPI library and
run your code; you should do profiling runs and determine the
bottlenecks (communication? computation?).


Sebastian
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com