| Aart Bik 2004-07-12, 8:58 pm |
|
"Toon Moene" <toon@moene.indiv.nluug.nl> wrote in message
news:40F2F74E.20405@moene.indiv.nluug.nl...
> DIMENSION A(1000000), B(1000000), C(1000000)
> READ*, X, Y
> A = LOG(X); B = LOG(Y); C = A + B
> PRINT*, C(500000)
> END
>
> A different and more difficult example is:
>
> SUBROUTINE SAXPY(X, Y, A, N)
> DIMENSION X(N), Y(N)
> Y = Y + A * X
> END
> where the alignment of X and Y isn't known.
Maybe of interest to another audience, all these loops are already
vectorized by the Intel compiler (and no visual inspection of the assembly
is required).
The alignment complications are resolved with dynamic loop peeling and
versioning.
[C:/cmplr/temp] ifort -QxN addlog.f
addlog.f(4) : (col. 6) remark: LOOP WAS VECTORIZED.
addlog.f(4) : (col. 18) remark: LOOP WAS VECTORIZED.
addlog.f(4) : (col. 30) remark: LOOP WAS VECTORIZED.
[C:/cmplr/temp] ifort -QxN saxpy.f
saxpy.f(3) : (col. 7) remark: LOOP WAS VECTORIZED.
Read all about this in the upcoming book on compiler optimizations for
multimedia extensions:
A.J.C. Bik. The Software Vectorization Handbook.
Applying Multimedia Extensions for Maximum Performance.
Intel Press, June, 2004.
http://www.intel.com/intelpress/sum_vmmx.htm
--
Aart Bik, Senior Staff Engineer, Intel Corporation
2200 Mission College Blvd. SC12-301, Santa Clara CA 95052
email: aart.bik@intel.com URL: http://www.aartbik.com/
|