Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I wish to inline a function in a loop, but I can't manually replace the function call with its source code since the function is a HIGH maintenance function and will be updated/modified frequently. The loop is a high traffic loop and takes up 99% of the total run time of the code. Right now I am using compiler directives to do the inlining, but I will like to hardcode it into the program. By the way, the code is written in F90. [1]. Is there any way to inline a function (maybe using CPP directives, or include statements) without replacing the function call with its source code. [2]. Does it matter in optimization and inlining (by compiler) if a function is defined using the CONTAINS statement within the calling program? [3]. Any tips for optimizing big loop and functions calls will be very useful. Here is an example (test codes are listed below): Given a compiler (xlf90, pgf90, ifort) and a optimization level (say O5) will program prog1 be better optimized than prog2 ? Thanks in advance, -Ari PROGRAM prog1 implicit none integer :: i,N double precision :: x,y ! N is very large ~10**12 do i=1,N y = bigfunc(x) end do CONTAINS FUNCTION bigfunc(x) : : : : : : : : : END FUNCTION bigfunc END PROGRAM prog1 PROGRAM prog2 implicit none integer :: i,N double precision :: x,y,bigfunc ! N is very large ~10**12 do i=1,N y = bigfunc(x) : : : : : : end do END PROGRAM prog2 FUNCTION bigfunc(x) : : : : : : : : : END FUNCTION bigfunc
Post Follow-up to this message"Arindam Chakraborty" <arimail77@yahoo.com> wrote in message news:4f273f02.0410252040.3477f713@posting.google.com... > > [1]. Is there any way to inline a function (maybe using CPP > directives, or include statements) without replacing the function call > with its source code. Why isn't a module function under consideration? Have you already figured out there is more than meets the eye to getting them optimized fully? At least one of the compilers you mentioned lacks machinery to do that in some cases. Depending how you write your function, it may not matter. INCLUDE, or a pre-processor equivalent, effectively copies everything in automatically. I can't tell whether you want that. You might even have to try all your choices to see how they work with each of your compilers. > > [2]. Does it matter in optimization and inlining (by compiler) if a > function is defined using the CONTAINS statement within the calling > program? As far as syntax goes, there is every reason why CONTAINS should lead to optimization at least as good as the other alternatives. In practice, it may not be so, due to the greater need for such optimization in C compilers, and the closer commonality of your prog2 example. > > [3]. Any tips for optimizing big loop and functions calls will be very > useful. If you have a loop of significant size inside the function, and make enough of the important details visible to the compiler locally, the questions of in-lining optimization won't matter.
Post Follow-up to this messagearimail77@yahoo.com (Arindam Chakraborty) wrote in message news:<4f273f02.0410252040.3477f7 13@posting.google.com>... > Hi, > I wish to inline a function in a loop, but I can't manually replace > the function call with its source code since the function is a HIGH > maintenance function and will be updated/modified frequently. The loop > is a high traffic loop and takes up 99% of the total run time of the > code. Right now I am using compiler directives to do the inlining, but > I will like to hardcode it into the program. By the way, the code is > written in F90. > > [1]. Is there any way to inline a function (maybe using CPP > directives, or include statements) without replacing the function call > with its source code. > > [2]. Does it matter in optimization and inlining (by compiler) if a > function is defined using the CONTAINS statement within the calling > program? > > [3]. Any tips for optimizing big loop and functions calls will be very > useful. > > Here is an example (test codes are listed below): > > Given a compiler (xlf90, pgf90, ifort) and a optimization level (say > O5) will program prog1 be better optimized than prog2 ? > > Thanks in advance, > -Ari > > PROGRAM prog1 > implicit none > integer :: i,N > double precision :: x,y > > ! N is very large ~10**12 > > do i=1,N > y = bigfunc(x) > end do > > > CONTAINS > FUNCTION bigfunc(x) > : : : > : : : > : : : > END FUNCTION bigfunc > > > END PROGRAM prog1 > > > > PROGRAM prog2 > implicit none > integer :: i,N > double precision :: x,y,bigfunc > > ! N is very large ~10**12 > > do i=1,N > y = bigfunc(x) > : : : > : : : > end do > > > END PROGRAM prog2 > > FUNCTION bigfunc(x) > : : : > : : : > : : : > END FUNCTION bigfunc A good compiler should optimize away loop-invariant tests within a loop, so you could try writing ifunc = 1 do i=1,N select case (ifunc) case (1) ! code for function 1 case (2) ! code for function 2 end select end do The risk of doing this was discussed in the thread "speed of tests in loops" in this newsgroup at http://groups.google.com/groups?hl=...ting.google.com .
Post Follow-up to this messageArindam Chakraborty wrote: > Hi, > I wish to inline a function in a loop, but I can't manually replace > the function call with its source code since the function is a HIGH > maintenance function and will be updated/modified frequently. The loop > is a high traffic loop and takes up 99% of the total run time of the > code. Right now I am using compiler directives to do the inlining, but > I will like to hardcode it into the program. My first question is: why? That's a real request for information, not an ind ictment against you even considering it -- do you have some evidence that suggests m anually inlining the code allows for better optimisations (and by that I presume you mean "it runs faster.") My second question is if you do inline the code manually, do you t hink that you can do a better job in optimising it than the, or a, compiler? Again, that's not a swipe at you - maybe you have a lot of experience in that sort of thing. I can ima gine a scenario where manual optimisations tested and tuned on one platform/compile r doesn't work too hot on others. Loop unrolling is another optimisation that compiler options can be used for - have you considered that also in your manual effort? Also, what about OpenMP and/or M PI? I don't know if you have a multiple CPU machine (or an OS that can handle it), but m aybe parallelising (is that a word?) the code is a more efficient way to go? For me the big issue in a case like this is the high maintenance of the "big func" code. That alone would make me want to isolate it as much as possible from the cal ling code to allow for several different developers to work on different versions without impacting the rest of the code. (Of course, I'm assuming you use a source control utility like, e.g. CVS) [snip] > Here is an example (test codes are listed below): > > Given a compiler (xlf90, pgf90, ifort) and a optimization level (say > O5) will program prog1 be better optimized than prog2 ? I personally prefer the module procedure for other things that just optimisa tion (automatic interface generation and checking). My gut tells me that a module procedure may also be able to be optimised "better" but that's really me waving my arms ab out wildly and wishing hard since I have no evidence (never checked). Also, I've found that using performance libraries for target platforms have provided greater speedups than the standard optimisations I'm familiar with (inlining , loop unrolling). That experience is based only on IBM and SGI systems, but the sp eed increase was about a factor of 2. At any rate, if you're going to use an optimisation level of O5, make sure y ou test it first at O1 or O2 to get "correct" answers to test against the O5 runs (with tolerances, because it's quite likely the numbers will be different). > Thanks in advance, > -Ari > > PROGRAM prog1 > implicit none > integer :: i,N > double precision :: x,y > > ! N is very large ~10**12 Hmm.... not for a "regular" integer, which typically is 4-bytes. By my finge r counting that makes the maximum value of N (and i) about 2e+09. Maybe you automatical ly promote integers to 8-bytes via a compiler option? [snipped examples] As I stated above, my preference is to use module procedures for the implici t (I hope I'm using that word correctly here!) interfacing. Other experienced and regular contributors to clf (James Giles comes to mind) sometimes recommend against module proced ures due to the compilation cascade problem -- which may be a significant issue for you given the high maintenance of the "bigfunc" source code. Reading over my reply, I didn't really help too much did I? Sorry 'bout that . cheers, paulv
Post Follow-up to this messagearimail77@yahoo.com (Arindam Chakraborty) writes: > I wish to inline a function in a loop, but I can't manually replace > the function call with its source code since the function is a HIGH > maintenance function and will be updated/modified frequently... Let me echo a variant of comments made by others. How sure are you that inlining will be of measurable help anyway? Even though the loop is most of the run time of the program, that doesn't mean that inlining will necessarily help. Generally, inlining helps most with small functions. When the function gets very complicated, the time required to execute the body of the function tends to dwarf the call overhead. Your description of the function as HIGH maintenance makes it sound like the function is probably large enough to make it doubtful whether inlining would help. But I realize that there are a lot of qualifiers like "generally" in the above para. Exceptions certainly exist. One could have a function that was both high maintenance and yet also small run-time, etc. So I'll assume that you actually would benefit from inlining... but I do encourage you, who know (much) more about the situation than I do, to consider the question. > [1]. Is there any way to inline a function (maybe using CPP > directives, or include statements) without replacing the function call > with its source code. Both CPP and include are *PURELY* textual replacements. Really. They are completely identical to writing out the same text and putting it there. There are no subtleties of consequence here. The most "subtle" point about include is that a statement that uses continuation lines can't be split across an include boundary... and that isn't exactly very subtle. CPP doesn't even have that issue, as it is completely independent of the compiler. *IF* pure textual replacement will work, then those methods would be fine. I would caution, however, that pure textual replacement could cause maintenance problems just because it has no special meaning. The compiler won't have any notion of the included text as being a distinct function; it will just be included text. I cannot emphhasize to much just how *NON*-special the included text would be. This means that if you include the body of the function right in place of where the function caall was, you won't be able to have declarations in it, just like you couldn't have declarations in source text that you otherwise put in the middle of the executable code. Your included text would share the same scope as the text around it. This means that anyone writing code to be included is darn well going to have to look at the whole subroutine while writing the include file. You'd be hard-pressed to avoid having to also make changes to the main file (to add declarations). Seems like this would negate much of the whole point of separating it out in the first place. If you are using include to just put the complete function (declarations and all) effectively in the same source file as the calling procedure, that's different. See below. > [2]. Does it matter in optimization and inlining (by compiler) if a > function is defined using the CONTAINS statement within the calling > program? Inlining is just too compiler-specific to make any generalizations that are of much value. One could say lots of things about what *COULD* happen, but what compilers actually do is a different matter. For example, one could argue, as you allude above, that functions that follow a CONTAINS ought to be good candidates for inlining... but that doesn't mean that any particular compiler actually will do so. Some compilers might be more likely to inline functions that are in the same source file... and include or cpp could be used to have that effect. But you can *NOT* just assume this will automatically help. It is too compiler-specific. You really need to read the documentation of your specific compiler to see what (if anything) it says about inlining. -- Richard Maine | Good judgment comes from experience; email: my first.last at org.domain | experience comes from bad judgment. org: nasa, domain: gov | -- Mark Twain
Post Follow-up to this messageI haven't been answering because the question is way too vague and the answer depends on too many factors. But, since I've been mentioned in the thread, I guess I'll add my two cents. Arindam Chakraborty wrote: ... > PROGRAM prog1 > implicit none > integer :: i,N > double precision :: x,y > > ! N is very large ~10**12 > > do i=1,N > y = bigfunc(x) > end do > CONTAINS > FUNCTION bigfunc(x) > : : : > : : : > : : : > END FUNCTION bigfunc > END PROGRAM prog1 The other example is the same except bigfunc is made external. First I notice that, unless the function has a side-effect on x, or saves internal state or something, the content of the loop will be invariant. Certainly if it *is* invariant, inlining it would permit the compiler to detect that and optimize the loop right out (always assuming the compiler applies such optimizations *after* the inlining step - see below). Second, it should be the case that inlining never makes the code slower, but it may not make the code significantly faster. (Oh, I suppose it may be possible to contrive examples using whole arrays that are copied in/out by the parameter passing mechanism so that they're in-cache inside the function but wouldn't be for the inlined code. The non-inlined version might be significantly faster for such a case. But a *really* smart compiler could make such local copies for the inlined version too...) How much benefit there is to inlining depends on (at least) what hardware you're using, what compiler you're using, what the loop does (in particular, whether some arguments, to the procedure or shared module/common variables are loop invariants), and what the procedure does (in particular, whether much of the work involves the particular args/shared-vars that happen to be loop invariants). And, it may be that the procedure does some work that's not dependent on its arguments - such "set-up" computations may be loop invariant. Even supposing the procedure has lots of loop-invariant computation, the compiler may not inline until *after* the phase in which it identifies loop-invariants. In that case, all inlining will do is get rid of the call/return overhead. That may, in itself, be a sizable saving if the procedure is short and such overhead was dominating your speed. But, if the compiler is no cleverer that that, your only real gain from inlining on a large procedure would be if you do it "manually" (as someone pointed out, that could be done with an INCLUDE construct). It could be the case (maybe even should be) that compilers always aggressively optimize internal procedures (those CONTAINED inside the caller as in the above example), including inlining them by default. Whether compilers do so is entirely up to the maker of that compiler. This kind of issue arises often enough that there's a standard phrase: "It's a quality of implementation issue." For many compilers these days, MODULE procedures are not optimized differently than externals. Hypothetically, MODULE procedures could be optimized as well as internal procedures. But, that would increase the incidence of "compilation cascades", and many compilers *don't* optimize MODULE procedures aggressively entirely in order to avoid those. :-( Frankly, the possibility of such optimization is the only really significant advantage of MODULE procedures in my view. (Arg type matching could be done on external procedures these days with "name- mangling", though I favor allowing procedures to INCLUDE and/or USE code containing INTERFACE specifications for themselves.) -- J. Giles "I conclude that there are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies." -- C. A. R. Hoare
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.