Home > Archive > Fortran > October 2006 > Fortran 90 MPI interface with non-parallel sub-model
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Fortran 90 MPI interface with non-parallel sub-model
|
|
|
| I'm new to the MPI field, so sorry if this is obvious.
I have two numerical models that interact quite happily in a non-MPI
environment...but complain
when running under MPI.
Model 1: Fortran 90, optimised for use under MPI
Model 2 (called from1): Fortran 77...not even remotely MPI enabled.
The data sharing/exchange amounts to a few 3-D arrays every few
time-steps. What I would
like to do is run model 1 on multiple-procs, then, when the call to
sub-model 2 arrives , collapse
down to 1-processor use (i.e. temporary no MPI). Is this
sensible/possible?
It seems that I could do a lots of MP_GATHERs in each sub-model 2
subroutine, and specify
that each step runs on PE 0...but this seems a bit messy (if it works
at all).
I running on IBM AIX, using xlf_r/mpxlf_r etc.
Thanks
Steve
| |
| unix_guru 2006-09-29, 8:01 am |
| On 28 Sep 2006 23:31:35 -0700, "steve" <se_george@yahoo.co.uk> wrote:
>I'm new to the MPI field, so sorry if this is obvious.
>
>I have two numerical models that interact quite happily in a non-MPI
>environment...but complain
>when running under MPI.
>
> Model 1: Fortran 90, optimised for use under MPI
> Model 2 (called from1): Fortran 77...not even remotely MPI enabled.
>
>The data sharing/exchange amounts to a few 3-D arrays every few
>time-steps. What I would
>like to do is run model 1 on multiple-procs, then, when the call to
>sub-model 2 arrives , collapse
>down to 1-processor use (i.e. temporary no MPI). Is this
>sensible/possible?
>
>It seems that I could do a lots of MP_GATHERs in each sub-model 2
>subroutine, and specify
>that each step runs on PE 0...but this seems a bit messy (if it works
>at all).
>
>I running on IBM AIX, using xlf_r/mpxlf_r etc.
>
>Thanks
>
>Steve
why collapse it down? why not just launch it on its own thread and
assign it to a specific processor by setting the affinity? let the
other processors continue to chew on rest of the code.
sloppy way would be to when you get to f77 code launch it in its own
memory space and reset the environmental variables for that space.
--
Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
------->>>>>>http://www.NewsDemon.com<<<<<<------
Unlimited Access, Anonymous Accounts, Uncensored Broadband Access
| |
| Craig Powers 2006-09-29, 7:01 pm |
| unix_guru wrote:
> On 28 Sep 2006 23:31:35 -0700, "steve" <se_george@yahoo.co.uk> wrote:
>
>
>
> why collapse it down? why not just launch it on its own thread and
> assign it to a specific processor by setting the affinity? let the
> other processors continue to chew on rest of the code.
Since he's talking MPI, I should think an MPI-based solution would be
more appropriate. I have a working knowledge of MPI, and I have no idea
how to do what you're suggesting. On the other hand, I do have some
ideas on what he wants to do.
| |
| Craig Powers 2006-09-29, 7:01 pm |
| steve wrote:
> I'm new to the MPI field, so sorry if this is obvious.
>
> I have two numerical models that interact quite happily in a non-MPI
> environment...but complain
> when running under MPI.
>
> Model 1: Fortran 90, optimised for use under MPI
> Model 2 (called from1): Fortran 77...not even remotely MPI enabled.
>
> The data sharing/exchange amounts to a few 3-D arrays every few
> time-steps. What I would
> like to do is run model 1 on multiple-procs, then, when the call to
> sub-model 2 arrives , collapse
> down to 1-processor use (i.e. temporary no MPI). Is this
> sensible/possible?
Without specific knowledge of the problem you're trying to solve, I
can't say if it's sensible. It's certainly possible to make it work,
conceptually (although, as a practical matter, there won't be "no MPI"
while you're running only one logical solution of model 2).
If you can, the simplest way to do it is to gather all of the necessary
inputs to sub-model 2 and have all the processors compute it
(redundantly). You'll then save on communication at the end -- all
processors will arrive at the same answer independently.
If this isn't feasible, then you can gather the inputs on one processor;
procid 0 is a sensible default, but it needn't be any in particular if
there's a chance another would make more sense. Have the other
processors hold at a barrier, then distribute the necessary parts of the
results and continue in parallel.
There's a third option -- parallelize Model 2. F77 vs. F90 doesn't
really matter, there's very little of MPI that has any need of F90.
| |
| unix_guru 2006-09-29, 7:01 pm |
| On Fri, 29 Sep 2006 12:00:48 -0400, Craig Powers <enigma@hal-pc.org>
wrote:
>unix_guru wrote:
>
>Since he's talking MPI, I should think an MPI-based solution would be
>more appropriate. I have a working knowledge of MPI, and I have no idea
>how to do what you're suggesting. On the other hand, I do have some
>ideas on what he wants to do.
so you don't know how to do what i suggested. so what, doesn't mean
author won't.
author is referring to NON-parallizedI based part of code. if he was
interested in recoding to MPICH, PVM, or LAM, it wouild have been
indicated as programming up any one of these is easy. it is most
likely some old spaghetti code which i have hacked plenty of. go find
some old NEC code and try to untangle it.
--
Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
------->>>>>>http://www.NewsDem
| |
| unix_guru 2006-09-29, 9:59 pm |
| On Fri, 29 Sep 2006 12:05:56 -0400, Craig Powers <enigma@hal-pc.org>
wrote:
>steve wrote:
>
>Without specific knowledge of the problem you're trying to solve, I
>can't say if it's sensible. It's certainly possible to make it work,
>conceptually (although, as a practical matter, there won't be "no MPI"
>while you're running only one logical solution of model 2).
>
>If you can, the simplest way to do it is to gather all of the necessary
>inputs to sub-model 2 and have all the processors compute it
>(redundantly). You'll then save on communication at the end -- all
>processors will arrive at the same answer independently.
you have just created quite possibly a huge bottleneck across all of
the nodes and clogged up the entire system for one program.
>
>If this isn't feasible, then you can gather the inputs on one processor;
>procid 0 is a sensible default, but it needn't be any in particular if
>there's a chance another would make more sense. Have the other
>processors hold at a barrier, then distribute the necessary parts of the
>results and continue in parallel.
basically all you are describing is a forked threading model. you
might as well recompile the code with OPENMP pragmas as all of these
are already built-in. atmoic clock, barrier, etc
>
>There's a third option -- parallelize Model 2. F77 vs. F90 doesn't
>really matter, there's very little of MPI that has any need of F90.
try recompiling the code with different optimizations, funroll-loops,
inlilning, etc see if the compiler can squeeze any more out of the
routine first.
--
Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
------->>>>>>http://www.NewsDem
| |
|
| Hi Craig
Thanks for the responses:
Craig Powers wrote:
> If you can, the simplest way to do it is to gather all of the necessary
> inputs to sub-model 2 and have all the processors compute it
> (redundantly). You'll then save on communication at the end -- all
> processors will arrive at the same answer independently.
I guess I must be doing something wrong here. I performed an MPI_GATHER
on all appropriate arrays and then called my sub-model. The system
seems
to be attempting to split this call once more i.e.if I perform a
maxval/minval
on the (gathered) array in the calling routine I get sensible values,
but
once in the the subroutine I end up with non-sensible values at one end
of the range...I assume these are unfilled memory locations (due to the
array splitting in the MPI process). These checks are fine in the
non-MPI
version btw. Sorry, maybe I'm missing something obvious here.
> There's a third option -- parallelize Model 2. F77 vs. F90 doesn't
> really matter, there's very little of MPI that has any need of F90.
I'd considered this, but the problem is that it's a community model.
i.e.
changing the code would invoke a fork that would be incompatible with
official updates etc
Steve
| |
| Craig Powers 2006-10-02, 7:03 pm |
| unix_guru wrote:
> author is referring to NON-parallizedI based part of code. if he was
> interested in recoding to MPICH, PVM, or LAM, it wouild have been
> indicated as programming up any one of these is easy. it is most
> likely some old spaghetti code which i have hacked plenty of. go find
> some old NEC code and try to untangle it.
He's talking about a package which is already partly done in MPI. I
indicated one method that may be perfectly usable without doing any
untangling of the code at all.
| |
| Craig Powers 2006-10-02, 7:03 pm |
| steve wrote:
> Hi Craig
>
> Thanks for the responses:
>
> Craig Powers wrote:
>
> I guess I must be doing something wrong here. I performed an MPI_GATHER
> on all appropriate arrays and then called my sub-model. The system
> seems
> to be attempting to split this call once more i.e.if I perform a
> maxval/minval
> on the (gathered) array in the calling routine I get sensible values,
> but
> once in the the subroutine I end up with non-sensible values at one end
> of the range...I assume these are unfilled memory locations (due to the
> array splitting in the MPI process). These checks are fine in the
> non-MPI
> version btw. Sorry, maybe I'm missing something obvious here.
MPI isn't going to do any array splitting that you don't tell it to do.
I would be suspicious of some sort of error in your array gathering,
or subsequent array dividing code getting called when you don't intend
it. You may want to double-check your coding, verify that all types,
ranges, and limits are correct, maybe do some sort of sanity check
between the gathering and the calling of the sub-model.
I'm assuming, here, that the sub-model contains absolutely no MPI code,
since in the original post you referred to it as "not remotely MPI enabled".
| |
| Craig Powers 2006-10-02, 7:03 pm |
| unix_guru wrote:
> On Fri, 29 Sep 2006 12:05:56 -0400, Craig Powers <enigma@hal-pc.org>
> wrote:
>
>
> you have just created quite possibly a huge bottleneck across all of
> the nodes and clogged up the entire system for one program.
/I/ didn't create the bottleneck, it's fundamental to the problem stated
by the OP. He has what I understand to be a unified program where one
segment is parallelized, a second segment is not, and both bits run
sequentially. I further assumed that this was iterated, so that within
a single run there would be a sequence of parallel -> serial -> parallel
(...).
As long as that sequence exists, there will fundamentally be a segment
where the the processors are tied up but idling. With the usage
patterns I am associated with, we don't worry about trying to deal with
that, I suppose in other organizations that may not be acceptable.
If any of my assumptions are incorrect, then the recommendations change.
>
> basically all you are describing is a forked threading model. you
> might as well recompile the code with OPENMP pragmas as all of these
> are already built-in. atmoic clock, barrier, etc
Are there any references for mixing OpenMP with MPI? They strike me as
fundamentally different solutions for fundamentally different
parallelization models. I understand OpenMP to be appropriate for
low-latency parallelization, e.g. multiple cores on a single node, with
MPI appropriate for high-latency, e.g. multiple nodes in a cluster.
I work with a clustering system that parcels out nodes to jobs, so it
would be very dangerous to use OpenMP without respecting the cluster
control's expectations of resource usage.
>
> try recompiling the code with different optimizations, funroll-loops,
> inlilning, etc see if the compiler can squeeze any more out of the
> routine first.
I trust the OP to understand the models well enough to know whether
there's anything to be gained by parallelizing the model or not. If
speed is important, the other optimization options should already have
been exercised.
|
|
|
|
|