Home > Archive > Fortran > May 2005 > Optimizing exponentiation
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Optimizing exponentiation
|
|
| Jean-Baptiste FAURE 2005-05-17, 8:59 am |
| Dear all,
is it possible to compute A**(2./3.) and A**(1./6.) without (and faster than)
exponentiation operator ?
JBF
| |
| Michel OLAGNON 2005-05-17, 8:59 am |
|
Jean-Baptiste FAURE wrote:
> Dear all,
>
> is it possible to compute A**(2./3.) and A**(1./6.) without (and faster
> than) exponentiation operator ?
>
> JBF
Yes X*SQRT(X)-A=0.0 can be solved (but not likely faster) and
why don't you ask your vendor for a spare if your exponentiation
operator is missing ?
| |
| Carlie J. Coats 2005-05-17, 8:59 am |
| Jean-Baptiste FAURE wrote:
> Dear all,
>
> is it possible to compute A**(2./3.) and A**(1./6.) without (and faster
> than) exponentiation operator ?
>
> JBF
An obvious ploy is to compute
X = LOG( A )
A2_3 = EXP( 2.0*x/3.0 )
A1_6 = EXP( X/6.0 )
(and better yet is to introduce PARAMETERs for 2./3. and 1./6.
-- Carlie
| |
| Michael Metcalf 2005-05-17, 8:59 am |
|
"Michel OLAGNON" <molagnon@ifremer-a-oter.fr> wrote in message
news:4289B61C.4020405@ifremer-a-oter.fr...
>
>
>
> Yes X*SQRT(X)-A=0.0 can be solved (but not likely faster) and
> why don't you ask your vendor for a spare if your exponentiation
> operator is missing ?
>
Also, once you've got, say, b = a**(1./6.) then the other result is b**4,
which is faster, in principle, than a second floating exponentiation (the
compiler should generate a couple of multiplications).
Regards,
Mike Metcalf
| |
| glen herrmannsfeldt 2005-05-17, 8:59 am |
| Jean-Baptiste FAURE wrote:
> is it possible to compute A**(2./3.) and A**(1./6.) without (and faster
> than) exponentiation operator ?
SQRT is usually done using Newton-Raphson, and I believe that
small rational powers can be done fast that way, and also likely a
little more accurately.
The time needed to write the routine is not usually made up in
execution speed, though.
Why do you want to know?
-- glen
| |
| David Jones 2005-05-17, 8:59 am |
| Jean-Baptiste FAURE wrote:
> Dear all,
>
> is it possible to compute A**(2./3.) and A**(1./6.) without (and
> faster than) exponentiation operator ?
>
> JBF
Some compilers may include a direct CBRT function (cube root) as an
extension, or you may find one in a mathematical library if you are
already using one. You might then find that combining this with
squaring and SQRT'ing is faster than exponentiation.
One advantage of avoiding the A**P formulation is that you can arrange
to get a sensible answer for some cases for A<0 where there is an
obvious answer that might be the one you want...
e.g. for A<0, CBRT(A)= - CBRT(ABS(A))
(CBRT(A))**2= (CBRT(ABS(A)))**2
whereas A**(2./3.) would fail for A<0 (and for A=0).
David Jones
| |
| Jean-Baptiste FAURE 2005-05-17, 3:59 pm |
| Le 17/05/2005 11:15, Michel OLAGNON a écrit :
>
>
> Jean-Baptiste FAURE wrote:
>
>
>
>
> Yes X*SQRT(X)-A=0.0 can be solved (but not likely faster) and
> why don't you ask your vendor for a spare if your exponentiation
> operator is missing ?
>
In fact I use g95 and after having profiling my code I found that more than 20%
of time is spent in exponentiation. I have removed integer exponentiations like
A**2 or A**3 but I don't know how to improve computation time of exponentiations
like A**(2./3.) or A**(1./6.) (which are not related to the others).
If I find CBRT (cubic root) function, as David Jones suggested, I can rewrite
exponentiation in the same way as A**(3./2.) = A*sqrt(A).
A**(2./3.) = CBRT(A*A)
A**(1./6.) = CBRT(sqrt(A))
Is it a good idea, from CPU time point of view, to write this function myself
using Newton method ?
JBF
| |
| Michel OLAGNON 2005-05-17, 3:59 pm |
|
Jean-Baptiste FAURE wrote:
> Le 17/05/2005 11:15, Michel OLAGNON a écrit :
>
> In fact I use g95 and after having profiling my code I found that more
> than 20% of time is spent in exponentiation. I have removed integer
> exponentiations like A**2 or A**3 but I don't know how to improve
> computation time of exponentiations like A**(2./3.) or A**(1./6.) (which
> are not related to the others).
> If I find CBRT (cubic root) function, as David Jones suggested, I can
> rewrite exponentiation in the same way as A**(3./2.) = A*sqrt(A).
> A**(2./3.) = CBRT(A*A)
> A**(1./6.) = CBRT(sqrt(A))
>
> Is it a good idea, from CPU time point of view, to write this function
> myself using Newton method ?
>
You can't know until you try. A few iterations seem OK in single precision.
read (*,*) A0
a = A0
do i= 1, 10
a = a - 2.*(a*sqrt(a) - A0)/(3.*sqrt(a))
write (*,*) a, a*sqrt(a)
enddo
end
| |
| Tim Prince 2005-05-17, 3:59 pm |
| David Jones wrote:
> Some compilers may include a direct CBRT function (cube root) as an
> extension, or you may find one in a mathematical library if you are
> already using one. You might then find that combining this with
> squaring and SQRT'ing is faster than exponentiation.
>
> One advantage of avoiding the A**P formulation is that you can arrange
> to get a sensible answer for some cases for A<0 where there is an
> obvious answer that might be the one you want...
> e.g. for A<0, CBRT(A)= - CBRT(ABS(A))
> (CBRT(A))**2= (CBRT(ABS(A)))**2
> whereas A**(2./3.) would fail for A<0 (and for A=0).
>
Some may include a vectorized version of cube root, possibly invoked by
A(:)**(1/3.)
--
Tim Prince
| |
| Michael Metcalf 2005-05-17, 3:59 pm |
|
"Jean-Baptiste FAURE" <faure@lyon.cemagref.fr> wrote in message
news:d6ckeu$i7o$1@demo2.univ-lyon1.fr...
> I have removed integer exponentiations like
> A**2 or A**3
That was almost certainly unnecessary. Exponentiation to 'low' integers is
normally carried out by an approriate sequence of multiplies, e.g A**4
becomes A*A and the result multiplied by itself.
Regards,
Mike Metcalf
| |
| Jean-Baptiste FAURE 2005-05-17, 3:59 pm |
| Le 17/05/2005 14:50, Michael Metcalf a écrit :
> "Jean-Baptiste FAURE" <faure@lyon.cemagref.fr> wrote in message
> news:d6ckeu$i7o$1@demo2.univ-lyon1.fr...
>
>
>
> That was almost certainly unnecessary. Exponentiation to 'low' integers is
> normally carried out by an approriate sequence of multiplies, e.g A**4
> becomes A*A and the result multiplied by itself.
>
Using g95 with -O3 optimization I obtained a reduction of CPU time by around 10%.
Regards
JBF
| |
|
| Hi,
for :
subroutine s1(a,b,c,d,e)
real :: a,b,c,d,e,f
integer, parameter :: sp=4
b=a**0.5_sp
c=a**(1.0_sp/3.0_sp)
d=a**(1.0_sp/6.0_sp)
e=a**(4.0_sp/6.0_sp)
end subroutine s1
g95 will generate four calls to pow_r4 (even with -O2 -ffast-math, the
minimal set of optimisation I would recommend).
Ifort generates at -O2 : fsqrt, cbrtf and twice powf.
I'm really surprised that ifort doesn't optimise this better by
computing a**(1.0/6.0) and multiplying things together (this kind of
stuff seems ot occur a lot in real life codes).
To me this leaves the option suggested by Michael Metcalf as the best
solution, compute
d=a**(1.0/6.0)
and get the other values by multiplication
c=d*d
e=c*c
b=c*d
The advantage of leaving the single call to '**' in the code is that
things remain rather clean (i.e. no convergence problems, good
accuracy, ...). As usual, check very carefully that the code gives the
same results before and after hand-optimising it. It is far too easy to
introduce mistakes....
Cheers,
Joost
Cheers,
Joost
| |
| David Jones 2005-05-17, 3:59 pm |
| Jean-Baptiste FAURE wrote:
> If I find CBRT (cubic root) function, as David Jones suggested, I
can
> rewrite exponentiation in the same way as A**(3./2.) = A*sqrt(A).
> A**(2./3.) = CBRT(A*A)
> A**(1./6.) = CBRT(sqrt(A))
>
>
> JBF
An old version of CBRT I wrote some time ago follows ... DAJ
REAL FUNCTION CBRT(X)
IMPLICIT NONE
REAL X
C CUBE ROOT FUNCTION
C
C
REAL XX,FACT,Y
XX=X
IF(XX)100,1,200
1 CBRT=0.
RETURN
100 FACT=-1.
XX=-XX
GOTO 300
200 FACT=1.
300 IF(XX-1.)310,301,320
301 CBRT=FACT
RETURN
310 FACT=FACT/2.
XX=XX*8.
IF(XX-1.)310,301,400
C
320 IF(XX-8.)400,321,330
321 CBRT=2.*FACT
RETURN
330 FACT=FACT*2.
XX=XX/8.
IF(XX-8.)400,321,330
C
400 Y=XX-1.
Y=(1.+Y*(578.+13.*Y)/1029.)/(1.+87.*Y/343.)
Y=(2.*Y+XX/(Y*Y))/3.
Y=(2.*Y+XX/(Y*Y))/3.
CBRT=FACT*Y
RETURN
END
| |
| glen herrmannsfeldt 2005-05-17, 8:58 pm |
| Michel OLAGNON wrote:
> Jean-Baptiste FAURE wrote:
(snip regarding cube root, and other integer roots)
With SQRT written in assembly, one divides the exponent part of the
floating point number by two, and then generates a good starting value
with that. The ones I have seen are two iterations for single and four
iterations for double precision after that.
The C library includes frexp() and ldexp() to separate the exponent
and fraction from a floating point value, and to put them back together.
frexp() could be coded in assembler, as system dependent routines, or in
a high level language using a binary search.
[color=darkred]
> You can't know until you try. A few iterations seem OK in single precision.
>
> read (*,*) A0
> a = A0
> do i= 1, 10
> a = a - 2.*(a*sqrt(a) - A0)/(3.*sqrt(a))
> write (*,*) a, a*sqrt(a)
> enddo
> end
With a good starting point this should converge fast, though
a more usual one would do without the sqrt() inside the loop.
-- glen
| |
| glen herrmannsfeldt 2005-05-17, 8:58 pm |
| David Jones wrote:
(snip)
> An old version of CBRT I wrote some time ago follows ... DAJ
Wow, a program using five arithmetic IF statements, each going
to three different statements!
> REAL FUNCTION CBRT(X)
> IMPLICIT NONE
> REAL X
>
> C CUBE ROOT FUNCTION
> C
> C
> REAL XX,FACT,Y
> XX=X
> IF(XX)100,1,200
> 1 CBRT=0.
> RETURN
> 100 FACT=-1.
> XX=-XX
> GOTO 300
> 200 FACT=1.
> 300 IF(XX-1.)310,301,320
> 301 CBRT=FACT
> RETURN
> 310 FACT=FACT/2.
> XX=XX*8.
> IF(XX-1.)310,301,400
> C
> 320 IF(XX-8.)400,321,330
> 321 CBRT=2.*FACT
> RETURN
> 330 FACT=FACT*2.
> XX=XX/8.
> IF(XX-8.)400,321,330
> C
> 400 Y=XX-1.
> Y=(1.+Y*(578.+13.*Y)/1029.)/(1.+87.*Y/343.)
> Y=(2.*Y+XX/(Y*Y))/3.
> Y=(2.*Y+XX/(Y*Y))/3.
> CBRT=FACT*Y
> RETURN
> END
| |
| Richard E Maine 2005-05-17, 8:58 pm |
| In article <JoidnQUHYNBDpxffRVn-sw@comcast.com>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> The C library includes frexp() and ldexp() to separate the exponent
> and fraction from a floating point value, and to put them back together.
> frexp() could be coded in assembler, as system dependent routines, or in
> a high level language using a binary search.
Or, it might also be of some slight relevance to this newsgroup :-) to
note that Fortran also has such intrinsics.
See the fraction, exponent, and scale intrinsics in F90 or later.
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| glen herrmannsfeldt 2005-05-17, 8:58 pm |
| Richard E Maine wrote:
> In article <JoidnQUHYNBDpxffRVn-sw@comcast.com>,
> glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
[color=darkred]
> Or, it might also be of some slight relevance to this newsgroup :-) to
> note that Fortran also has such intrinsics.
> See the fraction, exponent, and scale intrinsics in F90 or later.
Did they steal them from C? (I was recently reading in "Secrets of
the Code", a book about "The DaVinci Code", of "plagiarism by
anticipation". The early christian church explained the devil having
written the true story of Jesus before it actually happened.)
For some reason intrinsic function names are not in the index, so I
don't always find them. I did look up as many things as I could
find, though.
So, using those functions one can make a much better guess for cube
roots, or other Newton-Raphson root finders. Note that they are also
useful for writing EXP and LOG routines used for non-integer powers.
-- glen
| |
| Richard E Maine 2005-05-17, 8:58 pm |
| In article <_MCdnfAZh9NIzRffRVn-sw@comcast.com>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> Richard E Maine wrote:
>
>
> Did they steal them from C?
I wasn't there, but I rather doubt it. I didn't even know that C had
such functions, though I see that the man command does reference the C99
standard; I don't know whether they were new to C99 or not. I might have
a copy of the previous C standard around somewhere, but I'm not seeing
it at the moment. (I do have a copy of C99).
I'd rather suspect instead that they were added just because it is
generally useful functionality. F90 added quite a lot of stuff related
to numeric properties. Numerics being generally considered an important
thing in Fortran, those intrinsics were some of the ones I thought
particularly appropriate. Nice to get rid of the gory machine-dependent
code in LAPACK et al. - the kind of stuff traditionally in one of
Fortran's core strength areas. The 3 in question here aren't highest on
my personal list of ones that I use (in fact, I had forgotten about
scale() until I checked), but they are part of the set.
The Fortran functions aren't really defined very similarly to the C ones
anyway. Note that the C ones are hard-wired to base 2, while the Fortran
ones aren't. Certainly isn't a very direct "steal" in any case.
These are basic enough functionality that I don't think one has to look
at other languages to come up with the thought that they would be
useful. One might as well ask if other languages "stole" addition and
multiplication from Fortran.
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| James Giles 2005-05-17, 8:58 pm |
| glen herrmannsfeldt wrote:
> Jean-Baptiste FAURE wrote:
>
>
> SQRT is usually done using Newton-Raphson, and I believe that
> small rational powers can be done fast that way, and also likely a
> little more accurately.
Depends on the environment, There is a hardware algorithm
for square root that takes the same time as a divide operation
and is exact (that is, computed as if to infinite precision and
then correctly rounded). This algorithm is not related to
Newton-Raphson in any way.
Of course there was at least one machine where the hardware
divide (and later the square root) were themselves done using
Newton-Raphson. These were definitely *not* exact. So, it
depends on the environment.
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
| |
| James Giles 2005-05-17, 8:58 pm |
| glen herrmannsfeldt wrote:
> Richard E Maine wrote:
....
>
>
> Did they steal them from C? [...]
Well, C almost certainly stole them from elsewhere anyway,
so why is it relevant where the idea came from. They were
recommended in the IEEE standard (which had late 1970's
vintage deliberations and widespread discussions). Functions
to perform such operations probably existed in some applications
since floating point was invented.
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
| |
| glen herrmannsfeldt 2005-05-17, 8:58 pm |
| Richard E Maine wrote:
> In article <_MCdnfAZh9NIzRffRVn-sw@comcast.com>,
> glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
[color=darkred]
[color=darkred]
[color=darkred]
> I wasn't there, but I rather doubt it. I didn't even know that C had
> such functions, though I see that the man command does reference the C99
> standard; I don't know whether they were new to C99 or not. I might have
> a copy of the previous C standard around somewhere, but I'm not seeing
> it at the moment. (I do have a copy of C99).
They are at least to C89, and maybe before then. (My man pages still
reference C89. Maybe I need a newer system.)
(snip)
> The Fortran functions aren't really defined very similarly to the C ones
> anyway. Note that the C ones are hard-wired to base 2, while the Fortran
> ones aren't. Certainly isn't a very direct "steal" in any case.
I am not sure what C does on S/370. It usually takes some small changes
to the routines to make them work right for hex floating point, anyway,
and portable C libraries might not do those right.
> These are basic enough functionality that I don't think one has to look
> at other languages to come up with the thought that they would be
> useful. One might as well ask if other languages "stole" addition and
> multiplication from Fortran.
Amazing that C even uses the same operators for addition and
multiplication as Fortran! Still, I think it was more traditional
to write Fortran libraries in assembler than for C (and unix), where
portability to different machines was more common.
-- glen
-- glen
| |
| glen herrmannsfeldt 2005-05-17, 8:58 pm |
| James Giles wrote:
> glen herrmannsfeldt wrote:
(snip)
[color=darkred]
> Depends on the environment, There is a hardware algorithm
> for square root that takes the same time as a divide operation
> and is exact (that is, computed as if to infinite precision and
> then correctly rounded). This algorithm is not related to
> Newton-Raphson in any way.
Yes, I meant for software implementations. I am not sure which
algorithms are used for hardware square root. They aren't so
good at documenting that.
> Of course there was at least one machine where the hardware
> divide (and later the square root) were themselves done using
> Newton-Raphson. These were definitely *not* exact. So, it
> depends on the environment.
The 360/91 has Newton-Raphson floating point divide, more accurate
than (and so incompatible with) the S/360 architecture. (S/360
specifies a truncated quotient, the 91 rounds.) The 360/91 doesn't do
square roots in hardware, though.
Otherwise, S/360 sqrt() algorithms are slightly different than the
usual one, with the last iteration
Y4=(X/Y3 - Y3)/2 + Y3
plus some extra tricks to make it round right, due to the effect of a
base 16 exponent. The Fortran library routines were some of the
samples I had for learning assembly language some years ago.
-- glen
| |
| Jean-Baptiste FAURE 2005-05-18, 8:57 am |
| Le 17/05/2005 13:27, Jean-Baptiste FAURE a écrit :
> Le 17/05/2005 11:15, Michel OLAGNON a écrit :
>
> In fact I use g95 and after having profiling my code I found that more
> than 20% of time is spent in exponentiation. I have removed integer
> exponentiations like A**2 or A**3 but I don't know how to improve
> computation time of exponentiations like A**(2./3.) or A**(1./6.) (which
> are not related to the others).
> If I find CBRT (cubic root) function, as David Jones suggested, I can
> rewrite exponentiation in the same way as A**(3./2.) = A*sqrt(A).
> A**(2./3.) = CBRT(A*A)
> A**(1./6.) = CBRT(sqrt(A))
>
> Is it a good idea, from CPU time point of view, to write this function
> myself using Newton method ?
>
> JBF
I wrote the following code giving a 9% faster code (on only one test case) by
changing 5 lines (from 30000) with expressions like A**(1./3.), A**(2./3.) and
A**(1./6.) where exponents are defined by parameters.
function cubic_root(x)
!=======================================
=======================================
! calcul la racine cubique de x par la méthode de Newton
!=======================================
=======================================
use parametres, only : long
!prototype
real(kind=long), intent(in) :: x
real(kind=long) :: cubic_root
!
real(kind=long) :: y, y2, y3, dy
integer :: k
real(kind=long), parameter :: untier=0.33333333333_long
if (x < 0) then
write(*,*) '>>>> Erreur dans CUBIC_ROOT() : argument negatif'
stop
else
y = sqrt(x)
do k = 1, 20
y2 = y*y ; y3 = y2 * y
dy = (y3-x)/(3._long*y2)
!y3 = y * y * y
!dy = y*(untier-x/(3._long*y3))
if (abs(dy) > y*1.E-08_long) then
y = y-dy
else
exit
endif
enddo
cubic_root = y
endif
end function cubic_root
Initialization by sqrt is, for the moment, the best starting point I can think
to. Some tests shown that less than 7 iterations are sufficient to converge.
The commented lines are a less efficient alternative.
JBF
| |
| Tim Prince 2005-05-18, 4:00 pm |
| Jean-Baptiste FAURE wrote:
> I wrote the following code giving a 9% faster code (on only one test case)
> by changing 5 lines (from 30000) with expressions like A**(1./3.),
> A**(2./3.) and A**(1./6.) where exponents are defined by parameters.
>
> function cubic_root(x)
> !=======================================
=======================================
> ! calcul la racine cubique de x par la méthode de Newton
> !=======================================
=======================================
> use parametres, only : long
> !prototype
> real(kind=long), intent(in) :: x
> real(kind=long) :: cubic_root
> !
> real(kind=long) :: y, y2, y3, dy
> integer :: k
> real(kind=long), parameter :: untier=0.33333333333_long
>
> if (x < 0) then
> write(*,*) '>>>> Erreur dans CUBIC_ROOT() : argument negatif'
> stop
> else
> y = sqrt(x)
> do k = 1, 20
> y2 = y*y ; y3 = y2 * y
> dy = (y3-x)/(3._long*y2)
> !y3 = y * y * y
> !dy = y*(untier-x/(3._long*y3))
> if (abs(dy) > y*1.E-08_long) then
> y = y-dy
> else
> exit
> endif
> enddo
> cubic_root = y
> endif
> end function cubic_root
>
> Initialization by sqrt is, for the moment, the best starting point I can
> think to. Some tests shown that less than 7 iterations are sufficient to
> converge.
Unless you handle the exponent during initialization in one of the usual
ways, the number of iterations will depend strongly on argument value.
--
Tim Prince
| |
|
| Jean-Baptiste FAURE wrote:
>
> I wrote the following code giving a 9% faster code (on only one test case) by
> changing 5 lines (from 30000) with expressions like A**(1./3.), A**(2./3.) and
> A**(1./6.) where exponents are defined by parameters.
You can probably squeeze out some more with a direct (simplified)
recursion,
y = (2*y + x/y**2)/3
|
|
|
|
|