Home > Archive > Fortran > May 2005 > Computing exp(z)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| gh14tq5@yahoo.com 2005-05-19, 4:01 pm |
| Hi,
Are there any di vantages/pitfalls to computing exp(z) as
exp(z) = exp(real(z))*(cos(aimag(z)),sin(aimag(z)
))
I've noticed that computing exp(z) takes a noticeable portion of
computation time in my program, and was able to shave a few percentages
of the total time by computing the exponential in the above way. Are
there any other suggestions for computing exp(z) faster? I thought
about a lookup table, but it doesn't seem as convenient since z is
complex. I'm using CVF 6.6C and WinXP.
Thanks,
John
| |
| Tim Prince 2005-05-19, 4:01 pm |
| gh14tq5@yahoo.com wrote:
> Hi,
>
> Are there any di vantages/pitfalls to computing exp(z) as
>
> exp(z) = exp(real(z))*(cos(aimag(z)),sin(aimag(z)
))
>
> I've noticed that computing exp(z) takes a noticeable portion of
> computation time in my program, and was able to shave a few percentages
> of the total time by computing the exponential in the above way. Are
> there any other suggestions for computing exp(z) faster? I thought
> about a lookup table, but it doesn't seem as convenient since z is
> complex. I'm using CVF 6.6C and WinXP.
If this is faster, it may be due to the direct use in-line of the fastest
x87 on-chip math intrinsics. This could lose accuracy for large magnitude
double precision values, but it shouldn't be visible in single precision.
Also, it would likely increase the size of your .obj and .exe.
--
Tim Prince
| |
| gh14tq5@yahoo.com 2005-05-19, 4:01 pm |
| Tim Prince wrote:
> gh14tq5@yahoo.com wrote:
>
>
>
> If this is faster, it may be due to the direct use in-line of the fastest
> x87 on-chip math intrinsics. This could lose accuracy for large magnitude
> double precision values, but it shouldn't be visible in single precision.
> Also, it would likely increase the size of your .obj and .exe.
Hmm, interesting. I don't know much about how the x87 chip works (well,
nothing at all actually). My code is 99% single precision, so hopefully
the accuracy won't be a problem. The imaginary part of the argument is
probably limited to less than 10-20 in magnitude, although the real part
can get fairly large. I'm not really concerned about the size of the
executable. It's about 2 Mb at the moment.
Thanks,
John
| |
| David Frank 2005-05-19, 4:01 pm |
|
<gh14tq5@yahoo.com> wrote in message news:d6i38f$a7j$1@domitilla.aioe.org...
> Tim Prince wrote:
>
>
> Hmm, interesting. I don't know much about how the x87 chip works (well,
> nothing at all actually). My code is 99% single precision, so hopefully
> the accuracy won't be a problem. The imaginary part of the argument is
> probably limited to less than 10-20 in magnitude, although the real part
> can get fairly large. I'm not really concerned about the size of the
> executable. It's about 2 Mb at the moment.
>
> Thanks,
> John
CVF can use the INTEL FPU instructions to evaluate EXP as below test assy
shows
! ----------
program test
real(8) :: x
do i = 1,2
x = exp(i+10.0d0)
write (*,*) x
end do
end program
compiling with command line CVF options:
DF /fast /Fa test.f90
the following snip from TEST.ASM file shows the instructions used to
evaluate EXP
lab$0003: ; 000004
fld qword ptr -8[ebp] ; 000005
xor edx, edx ; 000006
lea eax, dword ptr -48[ebp]
fadd qword ptr .literal$+8 ; 000005
lea ecx, dword ptr -40[ebp] ; 000006
mov dword ptr -40[ebp], edx
lea edx, dword ptr .literal$
ffree st(6) ; 000005
ffree st(7)
fldl2e
fmul st(0), st(1)
fst st(1)
frndint
fxch st(1)
fsub st(0), st(1)
f2xm1
fld1
faddp st(1), st(0)
fscale
fstp st(1)
fstp qword ptr -48[ebp] ; 000006
sub esp, 4
push eax
push edx
push 59047680
push -1
push ecx
call _for_write_seq_lis
| |
| Ron Shepard 2005-05-19, 4:01 pm |
| In article <d6htql$rbq$1@domitilla.aioe.org>,
"gh14tq5@yahoo.com" <gh14tq5@yahoo.com> wrote:
> Are there any di vantages/pitfalls to computing exp(z) as
>
> exp(z) = exp(real(z))*(cos(aimag(z)),sin(aimag(z)
))
One possible di vantage is that this is not the correct equation.
At least it isn't the one that I've always used. Did you compare
results to see if your equation was correct? What you want, I
think, is something like:
r = abs(z)
theta = atan2(real(z),aimag(z))
expz = exp(r) * complx( cos(theta), sin(theta) )
Of course, you need to test for r=0.0 and so on, but you get the
point. I doubt that the intrinsic exp(z) is slower than this.
$.02 -Ron Shepard
| |
| James Van Buskirk 2005-05-19, 4:01 pm |
| "Ron Shepard" <ron-shepard@NOSPAM.comcast.net> wrote in message
news:ron-shepard-9A2B99.11040919052005@comcast.dca.giganews.com...
> r = abs(z)
> theta = atan2(real(z),aimag(z))
> expz = exp(r) * complx( cos(theta), sin(theta) )
I think the identity here is
z = r*cmplx(cos(theta),sin(theta),kind(z))
Pre-coffee?
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
| |
| David Frank 2005-05-19, 4:01 pm |
| ! ----------
program test
real :: x(10000000), t1, t2
call random_number(x)
call cpu_time(t1)
x = exp(x)
call cpu_time(t2)
write (*,*) t2-t1
write (*,*) x(2)
end program
CVF Pentium 2.8 Ghz
compiled with /fast switch = 1.750 sec runtime
without = 0.986 sec
| |
| David Frank 2005-05-19, 4:01 pm |
| reverse the results just posted
its /fast = 0.986 sec
> compiled with /fast switch = 1.750 sec runtime
> without = 0.986 sec
>
| |
|
|
James Van Buskirk wrote:
> "Ron Shepard" <ron-shepard@NOSPAM.comcast.net> wrote in message
> news:ron-shepard-9A2B99.11040919052005@comcast.dca.giganews.com...
>
>
> I think the identity here is
>
> z = r*cmplx(cos(theta),sin(theta),kind(z))
>
> Pre-coffee?
>
well, my after dinner observation is that you'd still need to swap the
arguments of atan2 to get your identity really accurate ...
Cheers,
Joost
| |
| glen herrmannsfeldt 2005-05-19, 4:01 pm |
| David Frank wrote:
> ! ----------
> program test
> real :: x(10000000), t1, t2
The OP asked for COMPLEX, though...
If you want to compile a complex EXP and post the assembly code
then we will know what it does. Likely, though, a call to CEXP.
-- glen
| |
| James Van Buskirk 2005-05-19, 4:01 pm |
| "Joost" <jv244@cam.ac.uk> wrote in message
news:1116522936.837692.217940@g43g2000cwa.googlegroups.com...
> well, my after dinner observation is that you'd still need to swap the
> arguments of atan2 to get your identity really accurate ...
:)
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
| |
| Tim Prince 2005-05-19, 8:58 pm |
| Ron Shepard wrote:
> In article <d6htql$rbq$1@domitilla.aioe.org>,
> "gh14tq5@yahoo.com" <gh14tq5@yahoo.com> wrote:
>
>
> One possible di vantage is that this is not the correct equation.
> At least it isn't the one that I've always used. Did you compare
> results to see if your equation was correct? What you want, I
> think, is something like:
>
> r = abs(z)
> theta = atan2(real(z),aimag(z))
> expz = exp(r) * complx( cos(theta), sin(theta) )
>
> Of course, you need to test for r=0.0 and so on, but you get the
> point. I doubt that the intrinsic exp(z) is slower than this.
The complex absolute value part of this can be quite slow, if it is done
carefully and without taking advantage of extended precision/range. If not
done carefully, it could break for |z| > 1e19. It will be quite difficult
to do this correctly and still beat CVF performance.
--
Tim Prince
| |
| Richard E Maine 2005-05-19, 8:58 pm |
| In article <d6iruv$3ag$1@news01.intel.com>,
Tim Prince <tprince@computer.org> wrote:
> It will be quite difficult
> to do this correctly and still beat CVF performance.
One would sort of expect that to be the way of things.
If a user can correctly duplicate the complete functionality of an
intrinsic procedure and do so faster than the intrinsic, then either
1. The intrinsic was implemented pretty sloppily.
2. The user is exceptionally talented and compiler vendors should
consider hiring him.
or
3. It didn't actually correctly duplicate the complete functionality.
Case 3 is probably the most likely, though the others are certainly
possible and have probably happened at one time or other.
Of course, there are times when the complete functionality isn't needed.
That's a different story. If you know the range of values that will be
used and don't have to fuss with edge cases, maybe this can turn into
speedups. Likewise you might be able to trade off some accuracy for
speed if the answer doesn't have to be accurate to the last bit.
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| gh14tq5@yahoo.com 2005-05-20, 4:00 am |
| Hi Ron,
Maybe it's too early in the morning for me, but I don't see why I can't
use my original equation. For example, if
z = a+j*b, (a,b are real)
then
exp(z)=exp(a+j*b)=exp(a)*exp(j*b)
=exp(a)*[cos(b)+j*sin(b)]
=exp(a)*cmplx(cos(b),sin(b))
Hopefully I didn't make some silly mistake in that algebra. There
might be some numeric advantage of
using
exp(z)=r*cmplx(cos(theta),sin(theta))
that I'm unaware of, but off the top of my head I don't see why one
would be preferred to the other. As I mentioned in my previous post,
the imaginary values of z in my problem are limited to a small band
around the real axis.
John
| |
| gh14tq5@yahoo.com 2005-05-20, 4:00 am |
| Hi Richard,
Richard E Maine wrote:
> One would sort of expect that to be the way of things.
>
> If a user can correctly duplicate the complete functionality of an
> intrinsic procedure and do so faster than the intrinsic, then either
>
> 1. The intrinsic was implemented pretty sloppily.
I don't know how the intrinsic is implemented.
>
> 2. The user is exceptionally talented and compiler vendors should
> consider hiring him.
>
2. This is definitely not the case. Half the time, I'm amazed my
programs even work at all.
> or
>
> 3. It didn't actually correctly duplicate the complete functionality.
>
> Case 3 is probably the most likely, though the others are certainly
> possible and have probably happened at one time or other.
>
> Of course, there are times when the complete functionality isn't
needed.
> That's a different story. If you know the range of values that will
be
> used and don't have to fuss with edge cases, maybe this can turn into
> speedups. Likewise you might be able to trade off some accuracy for
> speed if the answer doesn't have to be accurate to the last bit.
I think this is the case. I do not expect my implementation to be
accurate for all values of z. But, in my program, the range of z
values is limited. In this range, the substitute seems to give
identical results to the intrinsic. My problem is basically an inverse
Fourier transform ( int(f(w)exp(-j*w*t)dw) ) that has been slighly
deformed into the complex plane for small values (in magnitude) of the
variable w to avoid some problem points. For large values of w, w is
limited to the real axis. t is real and and limited in magnitude.
John
| |
| Greg Lindahl 2005-05-20, 4:00 am |
| In article <d6iruv$3ag$1@news01.intel.com>,
Tim Prince <tprince@_nospam_computer.org> wrote:
>The complex absolute value part of this can be quite slow, if it is done
>carefully and without taking advantage of extended precision/range. If not
>done carefully, it could break for |z| > 1e19. It will be quite difficult
>to do this correctly and still beat CVF performance.
SGI discovered long ago that a lot of code seems to use complex
numbers that are always small enough in absolute value to be able to
take the shortcut and not overflow. So they (and we) have a command
line option for this assertion (-OPT:fast_complex=ON). For example, if
you're doing phyics, and you *know* the maximum amplitude your
detector can emit...
This option isn't part of any of our bundles of options because it's
unsafe in general.
-- greg
| |
| Ron Shepard 2005-05-20, 4:00 am |
| In article <1116545609.183489.265020@o13g2000cwo.googlegroups.com>,
gh14tq5@yahoo.com wrote:
> exp(z)=exp(a+j*b)=exp(a)*exp(j*b)
> =exp(a)*[cos(b)+j*sin(b)]
> =exp(a)*cmplx(cos(b),sin(b))
>
> Hopefully I didn't make some silly mistake in that algebra.
Yes, your equation was right. I had a brain twitch. :-)
$.02 -Ron Shepard
|
|
|
|
|