For Programmers: Free Programming Magazines  


Home > Archive > Fortran > October 2004 > Inlining functions using *CONTAINS* statement









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Inlining functions using *CONTAINS* statement
Arindam Chakraborty

2004-10-26, 3:56 am

Hi,
I wish to inline a function in a loop, but I can't manually replace
the function call with its source code since the function is a HIGH
maintenance function and will be updated/modified frequently. The loop
is a high traffic loop and takes up 99% of the total run time of the
code. Right now I am using compiler directives to do the inlining, but
I will like to hardcode it into the program. By the way, the code is
written in F90.

[1]. Is there any way to inline a function (maybe using CPP
directives, or include statements) without replacing the function call
with its source code.

[2]. Does it matter in optimization and inlining (by compiler) if a
function is defined using the CONTAINS statement within the calling
program?

[3]. Any tips for optimizing big loop and functions calls will be very
useful.

Here is an example (test codes are listed below):

Given a compiler (xlf90, pgf90, ifort) and a optimization level (say
O5) will program prog1 be better optimized than prog2 ?

Thanks in advance,
-Ari

PROGRAM prog1
implicit none
integer :: i,N
double precision :: x,y

! N is very large ~10**12

do i=1,N
y = bigfunc(x)
end do


CONTAINS
FUNCTION bigfunc(x)
: : :
: : :
: : :
END FUNCTION bigfunc


END PROGRAM prog1



PROGRAM prog2
implicit none
integer :: i,N
double precision :: x,y,bigfunc

! N is very large ~10**12

do i=1,N
y = bigfunc(x)
: : :
: : :
end do


END PROGRAM prog2

FUNCTION bigfunc(x)
: : :
: : :
: : :
END FUNCTION bigfunc
Tim Prince

2004-10-26, 9:00 am


"Arindam Chakraborty" <arimail77@yahoo.com> wrote in message
news:4f273f02.0410252040.3477f713@posting.google.com...
>
> [1]. Is there any way to inline a function (maybe using CPP
> directives, or include statements) without replacing the function call
> with its source code.

Why isn't a module function under consideration? Have you already figured
out there is more than meets the eye to getting them optimized fully? At
least one of the compilers you mentioned lacks machinery to do that in some
cases. Depending how you write your function, it may not matter.
INCLUDE, or a pre-processor equivalent, effectively copies everything in
automatically. I can't tell whether you want that. You might even have to
try all your choices to see how they work with each of your compilers.
>
> [2]. Does it matter in optimization and inlining (by compiler) if a
> function is defined using the CONTAINS statement within the calling
> program?

As far as syntax goes, there is every reason why CONTAINS should lead to
optimization at least as good as the other alternatives. In practice, it
may not be so, due to the greater need for such optimization in C compilers,
and the closer commonality of your prog2 example.
>
> [3]. Any tips for optimizing big loop and functions calls will be very
> useful.

If you have a loop of significant size inside the function, and make enough
of the important details visible to the compiler locally, the questions of
in-lining optimization won't matter.


beliavsky@aol.com

2004-10-26, 9:00 am

arimail77@yahoo.com (Arindam Chakraborty) wrote in message news:<4f273f02.0410252040.3477f713@posting.google.com>...
> Hi,
> I wish to inline a function in a loop, but I can't manually replace
> the function call with its source code since the function is a HIGH
> maintenance function and will be updated/modified frequently. The loop
> is a high traffic loop and takes up 99% of the total run time of the
> code. Right now I am using compiler directives to do the inlining, but
> I will like to hardcode it into the program. By the way, the code is
> written in F90.
>
> [1]. Is there any way to inline a function (maybe using CPP
> directives, or include statements) without replacing the function call
> with its source code.
>
> [2]. Does it matter in optimization and inlining (by compiler) if a
> function is defined using the CONTAINS statement within the calling
> program?
>
> [3]. Any tips for optimizing big loop and functions calls will be very
> useful.
>
> Here is an example (test codes are listed below):
>
> Given a compiler (xlf90, pgf90, ifort) and a optimization level (say
> O5) will program prog1 be better optimized than prog2 ?
>
> Thanks in advance,
> -Ari
>
> PROGRAM prog1
> implicit none
> integer :: i,N
> double precision :: x,y
>
> ! N is very large ~10**12
>
> do i=1,N
> y = bigfunc(x)
> end do
>
>
> CONTAINS
> FUNCTION bigfunc(x)
> : : :
> : : :
> : : :
> END FUNCTION bigfunc
>
>
> END PROGRAM prog1
>
>
>
> PROGRAM prog2
> implicit none
> integer :: i,N
> double precision :: x,y,bigfunc
>
> ! N is very large ~10**12
>
> do i=1,N
> y = bigfunc(x)
> : : :
> : : :
> end do
>
>
> END PROGRAM prog2
>
> FUNCTION bigfunc(x)
> : : :
> : : :
> : : :
> END FUNCTION bigfunc


A good compiler should optimize away loop-invariant tests within a
loop, so you could try writing

ifunc = 1
do i=1,N
select case (ifunc)
case (1)
! code for function 1
case (2)
! code for function 2
end select
end do

The risk of doing this was discussed in the thread "speed of tests in
loops" in this newsgroup at
http://groups.google.com/groups?hl=...ting.google.com
..
Paul Van Delst

2004-10-26, 3:58 pm

Arindam Chakraborty wrote:
> Hi,
> I wish to inline a function in a loop, but I can't manually replace
> the function call with its source code since the function is a HIGH
> maintenance function and will be updated/modified frequently. The loop
> is a high traffic loop and takes up 99% of the total run time of the
> code. Right now I am using compiler directives to do the inlining, but
> I will like to hardcode it into the program.


My first question is: why? That's a real request for information, not an indictment
against you even considering it -- do you have some evidence that suggests manually
inlining the code allows for better optimisations (and by that I presume you mean "it runs
faster.") My second question is if you do inline the code manually, do you think that you
can do a better job in optimising it than the, or a, compiler? Again, that's not a swipe
at you - maybe you have a lot of experience in that sort of thing. I can imagine a
scenario where manual optimisations tested and tuned on one platform/compiler doesn't work
too hot on others.

Loop unrolling is another optimisation that compiler options can be used for - have you
considered that also in your manual effort? Also, what about OpenMP and/or MPI? I don't
know if you have a multiple CPU machine (or an OS that can handle it), but maybe
parallelising (is that a word?) the code is a more efficient way to go?

For me the big issue in a case like this is the high maintenance of the "bigfunc" code.
That alone would make me want to isolate it as much as possible from the calling code to
allow for several different developers to work on different versions without impacting the
rest of the code. (Of course, I'm assuming you use a source control utility like, e.g. CVS)

[snip]

> Here is an example (test codes are listed below):
>
> Given a compiler (xlf90, pgf90, ifort) and a optimization level (say
> O5) will program prog1 be better optimized than prog2 ?


I personally prefer the module procedure for other things that just optimisation
(automatic interface generation and checking). My gut tells me that a module procedure may
also be able to be optimised "better" but that's really me waving my arms about wildly and
wishing hard since I have no evidence (never checked).

Also, I've found that using performance libraries for target platforms have provided
greater speedups than the standard optimisations I'm familiar with (inlining, loop
unrolling). That experience is based only on IBM and SGI systems, but the speed increase
was about a factor of 2.

At any rate, if you're going to use an optimisation level of O5, make sure you test it
first at O1 or O2 to get "correct" answers to test against the O5 runs (with tolerances,
because it's quite likely the numbers will be different).

> Thanks in advance,
> -Ari
>
> PROGRAM prog1
> implicit none
> integer :: i,N
> double precision :: x,y
>
> ! N is very large ~10**12


Hmm.... not for a "regular" integer, which typically is 4-bytes. By my finger counting
that makes the maximum value of N (and i) about 2e+09. Maybe you automatically promote
integers to 8-bytes via a compiler option?

[snipped examples]

As I stated above, my preference is to use module procedures for the implicit (I hope I'm
using that word correctly here!) interfacing. Other experienced and regular contributors
to clf (James Giles comes to mind) sometimes recommend against module procedures due to
the compilation cascade problem -- which may be a significant issue for you given the high
maintenance of the "bigfunc" source code.

Reading over my reply, I didn't really help too much did I? Sorry 'bout that.

cheers,

paulv
Richard E Maine

2004-10-26, 3:58 pm

arimail77@yahoo.com (Arindam Chakraborty) writes:

> I wish to inline a function in a loop, but I can't manually replace
> the function call with its source code since the function is a HIGH
> maintenance function and will be updated/modified frequently...


Let me echo a variant of comments made by others. How sure are you
that inlining will be of measurable help anyway? Even though the loop
is most of the run time of the program, that doesn't mean that inlining
will necessarily help. Generally, inlining helps most with small
functions. When the function gets very complicated, the time required
to execute the body of the function tends to dwarf the call overhead.
Your description of the function as HIGH maintenance makes it sound
like the function is probably large enough to make it doubtful whether
inlining would help.

But I realize that there are a lot of qualifiers like "generally" in
the above para. Exceptions certainly exist. One could have a function
that was both high maintenance and yet also small run-time, etc.
So I'll assume that you actually would benefit from inlining... but I
do encourage you, who know (much) more about the situation than I do,
to consider the question.

> [1]. Is there any way to inline a function (maybe using CPP
> directives, or include statements) without replacing the function call
> with its source code.


Both CPP and include are *PURELY* textual replacements. Really.
They are completely identical to writing out the same text and putting
it there. There are no subtleties of consequence here. The most
"subtle" point about include is that a statement that uses continuation
lines can't be split across an include boundary... and that isn't
exactly very subtle. CPP doesn't even have that issue, as it is completely
independent of the compiler.

*IF* pure textual replacement will work, then those methods would be fine.
I would caution, however, that pure textual replacement could cause
maintenance problems just because it has no special meaning. The compiler
won't have any notion of the included text as being a distinct function;
it will just be included text. I cannot emphhasize to much just how
*NON*-special the included text would be.

This means that if you include the body of the function right in place
of where the function caall was, you won't be able to have
declarations in it, just like you couldn't have declarations in source
text that you otherwise put in the middle of the executable code. Your
included text would share the same scope as the text around it. This
means that anyone writing code to be included is darn well going to
have to look at the whole subroutine while writing the include file.
You'd be hard-pressed to avoid having to also make changes to the main
file (to add declarations). Seems like this would negate much of the
whole point of separating it out in the first place.

If you are using include to just put the complete function (declarations
and all) effectively in the same source file as the calling procedure,
that's different. See below.

> [2]. Does it matter in optimization and inlining (by compiler) if a
> function is defined using the CONTAINS statement within the calling
> program?


Inlining is just too compiler-specific to make any generalizations that
are of much value. One could say lots of things about what *COULD*
happen, but what compilers actually do is a different matter. For
example, one could argue, as you allude above, that functions that
follow a CONTAINS ought to be good candidates for inlining... but that
doesn't mean that any particular compiler actually will do so.

Some compilers might be more likely to inline functions that are in
the same source file... and include or cpp could be used to have that
effect. But you can *NOT* just assume this will automatically help.
It is too compiler-specific. You really need to read the documentation
of your specific compiler to see what (if anything) it says about
inlining.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
James Giles

2004-10-26, 3:58 pm

I haven't been answering because the question is way too
vague and the answer depends on too many factors. But,
since I've been mentioned in the thread, I guess I'll add
my two cents.

Arindam Chakraborty wrote:
....
> PROGRAM prog1
> implicit none
> integer :: i,N
> double precision :: x,y
>
> ! N is very large ~10**12
>
> do i=1,N
> y = bigfunc(x)
> end do


> CONTAINS
> FUNCTION bigfunc(x)
> : : :
> : : :
> : : :
> END FUNCTION bigfunc


> END PROGRAM prog1


The other example is the same except bigfunc is made external.

First I notice that, unless the function has a side-effect on x,
or saves internal state or something, the content of the loop
will be invariant. Certainly if it *is* invariant, inlining it
would permit the compiler to detect that and optimize the
loop right out (always assuming the compiler applies such
optimizations *after* the inlining step - see below).

Second, it should be the case that inlining never makes the
code slower, but it may not make the code significantly
faster. (Oh, I suppose it may be possible to contrive
examples using whole arrays that are copied in/out by
the parameter passing mechanism so that they're in-cache
inside the function but wouldn't be for the inlined code.
The non-inlined version might be significantly faster
for such a case. But a *really* smart compiler could make
such local copies for the inlined version too...)

How much benefit there is to inlining depends on (at least)
what hardware you're using, what compiler you're using,
what the loop does (in particular, whether some arguments,
to the procedure or shared module/common variables are
loop invariants), and what the procedure does (in particular,
whether much of the work involves the particular args/shared-vars
that happen to be loop invariants). And, it may be that the procedure
does some work that's not dependent on its arguments - such "set-up"
computations may be loop invariant.

Even supposing the procedure has lots of loop-invariant
computation, the compiler may not inline until *after* the
phase in which it identifies loop-invariants. In that case, all
inlining will do is get rid of the call/return overhead. That may,
in itself, be a sizable saving if the procedure is short and such
overhead was dominating your speed. But, if the compiler is no
cleverer that that, your only real gain from inlining on a large
procedure would be if you do it "manually" (as someone pointed
out, that could be done with an INCLUDE construct).

It could be the case (maybe even should be) that compilers always
aggressively optimize internal procedures (those CONTAINED
inside the caller as in the above example), including inlining them
by default. Whether compilers do so is entirely up to the maker
of that compiler. This kind of issue arises often enough that
there's a standard phrase: "It's a quality of implementation
issue."

For many compilers these days, MODULE procedures are not
optimized differently than externals. Hypothetically, MODULE
procedures could be optimized as well as internal procedures.
But, that would increase the incidence of "compilation cascades",
and many compilers *don't* optimize MODULE procedures
aggressively entirely in order to avoid those. :-( Frankly, the
possibility of such optimization is the only really significant
advantage of MODULE procedures in my view. (Arg type matching
could be done on external procedures these days with "name-
mangling", though I favor allowing procedures to INCLUDE
and/or USE code containing INTERFACE specifications for
themselves.)

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com