Home > Archive > Fortran > May 2005 > spawning processes
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
spawning processes
|
|
|
| I know very little about fortran, but am supervising someone who knows a
little about it (still learning).
We have a program that we would like to convert to do parallel processing.
So we have something like this (pseudocode, not fortran)
someDataType array[100][100]; // 2 dimensional array of something
for (i = 1 to 100)
for (j = 1 to 100)
doCalculation (array [i,j]) // perform calculation on this element
end for
end for
This is linear. We can make it so the each element is independent of any
other and hence do parallel processing. Something like this:
someDataType array[100][100]; // 2 dimensional array of something
for (i = 1 to 100)
for (j = 1 to 100)
spawn (doCalculation (array [i,j])) // perform calculation
end for
end for
wait (until all processes have returned)
I like to know if this can be done in fortran, and if so, some sample code
of how to do this would be great.
Thanks!
| |
| Arjen Markus 2005-05-13, 4:10 pm |
| Al wrote:
>
> I know very little about fortran, but am supervising someone who knows a
> little about it (still learning).
>
> We have a program that we would like to convert to do parallel processing.
> So we have something like this (pseudocode, not fortran)
>
> someDataType array[100][100]; // 2 dimensional array of something
> for (i = 1 to 100)
> for (j = 1 to 100)
> doCalculation (array [i,j]) // perform calculation on this element
> end for
> end for
>
> This is linear. We can make it so the each element is independent of any
> other and hence do parallel processing. Something like this:
>
> someDataType array[100][100]; // 2 dimensional array of something
> for (i = 1 to 100)
> for (j = 1 to 100)
> spawn (doCalculation (array [i,j])) // perform calculation
> end for
> end for
>
> wait (until all processes have returned)
>
> I like to know if this can be done in fortran, and if so, some sample code
> of how to do this would be great.
>
> Thanks!
Well, if each computation for an array item is independent, then the
easiest solution is to use OpenMP - a set of compiler directives that
instruct the compiler to create threads and pass the information to
them and gather everything.
I have no source code at hand right now, but just look up the examples
in your compiler's documentation - there are bound to be examples.
Regards,
Arjen
| |
|
| Thanks for the replies.
I've looked into openmp. It looks like it is on all IBM and SGI
supercomputers,
which offer a "shared memory" architecture. However, I'm not certain
whether linux clusters offer it. This is important since these
clusters are the most common supercomputering environment.
There is also MPI (http://www-unix.mcs.anl.gov/mpi/) which may prove to
be useful.
Herman D. Knoble wrote:
> OpenMP is an excellent suggestion. A working example, not unlike
this
> specific case involving loops, appears at:
> http://www.openmp.org/drupal/samples/md.html
> Look for the !$omp statements, in particular in Subroutine Update.
>
> This assumes, of course, that the platform where this code is to be
run
> supports a compiler option like -openmp and that there is
> available more than one processor (preferably several more than two).
>
> Most of the linux compilers like Intel ifort and PathScale pathf90,
> Lahey LF90, Absoft f90, Portland pgi90 all support the option:
-openmp
>
> Some more OpenMP examples are available at: http://www.openmp.org/
> Tutorials on parallel computing:
http://www.mhpcc.edu/training/tutorials/
> General reference on High Performance Fortran:
> http://dacnet.rice.edu/Depts/CRPC/HPFF/index.cfm
>
>
> Skip Knoble
>
>
> On Fri, 13 May 2005 16:30:48 +0200, Arjen Markus
<arjen.markus@wldelft.nl> wrote:
>
> -|Al wrote:
> -|>
> -|> I know very little about fortran, but am supervising someone who
knows a
> -|> little about it (still learning).
> -|>
> -|> We have a program that we would like to convert to do parallel
processing.
> -|> So we have something like this (pseudocode, not fortran)
> -|>
> -|> someDataType array[100][100]; // 2 dimensional array of something
> -|> for (i = 1 to 100)
> -|> for (j = 1 to 100)
> -|> doCalculation (array [i,j]) // perform calculation on this
element
> -|> end for
> -|> end for
> -|>
> -|> This is linear. We can make it so the each element is
independent of any
> -|> other and hence do parallel processing. Something like this:
> -|>
> -|> someDataType array[100][100]; // 2 dimensional array of something
> -|> for (i = 1 to 100)
> -|> for (j = 1 to 100)
> -|> spawn (doCalculation (array [i,j])) // perform calculation
> -|> end for
> -|> end for
> -|>
> -|> wait (until all processes have returned)
> -|>
> -|> I like to know if this can be done in fortran, and if so, some
sample code
> -|> of how to do this would be great.
> -|>
> -|> Thanks!
> -|
> -|Well, if each computation for an array item is independent, then
the
> -|easiest solution is to use OpenMP - a set of compiler directives
that
> -|instruct the compiler to create threads and pass the information to
> -|them and gather everything.
> -|
> -|I have no source code at hand right now, but just look up the
examples
> -|in your compiler's documentation - there are bound to be examples.
> -|
> -|Regards,
> -|
> -|Arjen
| |
| Paul Van Delst 2005-05-18, 4:00 pm |
| Al wrote:
> Thanks for the replies.
>
> I've looked into openmp. It looks like it is on all IBM and SGI
> supercomputers,
> which offer a "shared memory" architecture. However, I'm not certain
> whether linux clusters offer it. This is important since these
> clusters are the most common supercomputering environment.
I didn't think a shared memory architecture was necessary for OpenMP. The IBM
supercomputers here at NCEP have distributed memory architectures (to the best of my
understanding) and they have OpenMP.
> There is also MPI (http://www-unix.mcs.anl.gov/mpi/) which may prove to
> be useful.
OpenMP and MPI are slightly different beasties. I'm not an expert, but it's my
understanding that OpenMP is used to distribute processes across CPUs within a node,
whereas MPI is used to distribute processes across different nodes.
I think both are available for linux clusters.
cheers,
paulv
>
>
> Herman D. Knoble wrote:
>
>
> this
>
>
> run
>
>
> -openmp
>
>
> http://www.mhpcc.edu/training/tutorials/
>
>
> <arjen.markus@wldelft.nl> wrote:
>
>
> knows a
>
>
> processing.
>
>
> element
>
>
> independent of any
>
>
> sample code
>
>
> the
>
>
> that
>
>
>
>
> examples
>
>
>
--
Paul van Delst
CIMSS @ NOAA/NCEP/EMC
| |
|
| I've made something similar to you, using MPI on linux. It was a loop
like that, and I could make the parallelized version easy. Just a few
extra line codes. And to let the program work in the linear version, I
just used compiler directives.
| |
| Richard Edgar 2005-05-18, 4:00 pm |
| Paul Van Delst wrote:
>
> I didn't think a shared memory architecture was necessary for OpenMP.
> The IBM supercomputers here at NCEP have distributed memory
> architectures (to the best of my understanding) and they have OpenMP.
I don't think that shared memory is _necessary_ for OpenMP.... in a
similar way to the only _necessary_ instruction for a CPU is 'subtract
and branch on borrow.' :-)
OpenMP does rather assume that any thread can get at any data relatively
quickly. That's simply not true for a distributed memory machine.
> OpenMP and MPI are slightly different beasties. I'm not an expert, but
> it's my understanding that OpenMP is used to distribute processes across
> CPUs within a node, whereas MPI is used to distribute processes across
> different nodes.
They are certainly very different - and I've even heard rumours of
hydrid codes, which use OpenMP on each node (perhaps each node has four
processors), and MPI for communication between nodes. I've used OpenMP
more than MPI, but I'd make the following comments:
1) If you already have a working code, and multiprocessor machine, then
OpenMP is a good way of getting small speed ups (up to four, maybe
eight) with a minimum of effort. This (naturally) only applies if the
problem is intrinsically parallel, and the code has been reasonably
cleanly written. Profile it, find the expensive loops, and parallelise those
2) MPI is a lot more difficult generally (full of calls like 'send these
bytes to this processor). Although it has been done, I wouldn't fancy
retrofitting it to a substantial existing program either - it would be
better to design it from scratch to be an MPI code. However, by
requiring the programmer to be smarter, MPI is _always_ going to beat
OpenMP in the end. The relative cheapness of Linux clusters skews the
situation further
> I think both are available for linux clusters.
Well, Altix machines are shard memory ;-)
IME, quite a few clusters are also built around dual CPU boxes, and it's
been worth my while to run OpenMP codes on a single node, rather than
trying to use MPI. I do use an MPI code, but that was written by other
people. And I found extending it to be ugly - I had a custom datatype
which I wanted to pass around. This had to be done by packing the
information into an array, sending it, and unpacking it again at the
end. Not pleasant.
HTH,
Richard
| |
| Ian Bush 2005-05-19, 8:56 am |
| Richard Edgar wrote:
>
> I don't think that shared memory is _necessary_ for OpenMP.... in a
> similar way to the only _necessary_ instruction for a CPU is 'subtract
> and branch on borrow.' :-)
>
> OpenMP does rather assume that any thread can get at any data relatively
> quickly. That's simply not true for a distributed memory machine.
It just assumes that any thread can get at any data that is declared to be
public. Speed is useful though !
>
>
> They are certainly very different - and I've even heard rumours of
> hydrid codes, which use OpenMP on each node (perhaps each node has four
> processors), and MPI for communication between nodes.
Yes, these certainly exist. For example, I've seen performance figures
for a weather prediction code comparing pure MPI and mixed mode.
> I've used OpenMP
> more than MPI, but I'd make the following comments:
>
> 1) If you already have a working code, and multiprocessor machine, then
> OpenMP is a good way of getting small speed ups (up to four, maybe
> eight) with a minimum of effort. This (naturally) only applies if the
> problem is intrinsically parallel, and the code has been reasonably
> cleanly written. Profile it, find the expensive loops, and parallelise
> those
>
> 2) MPI is a lot more difficult generally (full of calls like 'send these
> bytes to this processor). Although it has been done, I wouldn't fancy
> retrofitting it to a substantial existing program either - it would be
> better to design it from scratch to be an MPI code. However, by
> requiring the programmer to be smarter, MPI is _always_ going to beat
> OpenMP in the end. The relative cheapness of Linux clusters skews the
> situation further
My experience is rather the other way around, in fact my day to day
programming uses MPI, but in general I agree with Richard's comments,
apart from the difficulty of MPI. I don't find it much harder than OpenMP,
and in fact in some ways I find it easier ( all variables are effectively
private so you don't have to worry about what should be public, critical
regions etc. ) However that's probably a function of familiarity.
Also I would say is that while it is in general best to design a parallel
code as a parallel code from scratch, in quite a few cases retrofitting
is not as bad as you may think. Well I have to do a lot of it so it
can't be too bad !
Ian
| |
|
| I don't think that mixed MPI/OMP codes are so unusual anymore. At least
two codes I work with have been written in that way.
Personally, I like MPI codes more, as they seem much easier to
maintain. It is hard to break a working MPI code, but OMP, being just
comments, can be easily overlooked or coded away. Furthermore, MPI is
more general, few people (at least in academics) have a 64 CPUs shared
memory machine, but many have a linux cluster that size. MPI will run
out-of-the-box (ideal world) on both systems. Having said that, the
power of OMP is certainly that it can be *very easy* to parallelise
existing serial code over few CPUs (say 2-4, a typical linux
workstation size). It is something I do from time to time with the code
I use to analyse results. The first version is written in a serial way
and takes a few seconds to run, as the amount of data grows it may take
several minutes to run. At that point, while waiting for the code to
finish, you can easily add 3 or 4 directives (or even just a single
compiler switch) and halve the walltime needed.
Cheers,
Joost
|
|
|
|
|