Home > Archive > Fortran > April 2007 > MATMUL implementations
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
MATMUL implementations
|
|
| Bart Vandewoestyne 2007-04-26, 4:08 am |
| During some of the Fortran lab-sessions I teach, I let my
students perform timings on certain algorithms. One of the
timings they had to do is for different ways of doing
matrix-multiplications, namely with 3 naive loops, with MATMUL
and with the DGEMM BLAS routine. We used the CPU_TIME intrinsic
to do our timings.
During our lab-session where we ran the experiments on the
departemental computers, we noticed how MATMUL was quite a bit
faster than the naive implementation with 3 for-loops. This was
also what i intuitively expected. But now one of my students
mentions that on his own laptop the MATMUL timings are the same
as the timings for the 3 for-loops (where the indices are chosen
optimally considering the column-oriented ordering of Fortran).
The question of the student was then if this could be possible.
He wondered if the standard says anything about the
*implementation* of MATMUL.
My answer would be that the standard indeed doesn't specify
anything about the *practical implementation* of MATMUL. So compiler
writers are free to choose what algorithms they use to implement
MATMUL. The fact that he notices almost no difference between
MATMUL and the 3 for-loops is probably because his compiler
(which he doesn't mension) doesn't use an optimal MATMUL
implementation.
Any comments on this are welcome, i like to give decent feedback
to my students :-)
Thanks,
Bart
--
"Share what you know. Learn what you don't."
| |
| Ian Bush 2007-04-26, 4:08 am |
| As if by magic, Bart Vandewoestyne appeared !
>
> My answer would be that the standard indeed doesn't specify
> anything about the *practical implementation* of MATMUL. So compiler
> writers are free to choose what algorithms they use to implement
> MATMUL. The fact that he notices almost no difference between
> MATMUL and the 3 for-loops is probably because his compiler
> (which he doesn't mension) doesn't use an optimal MATMUL
> implementation.
>
That's my understanding of the situation. The standard doesn't
say how TAN should be implemented so why should MATMUL be any
different ?
In fact when f90 compilers first came out my experience was that
for most the performance of MATMUL was very poor, and as a result
to this day I still avoid its use in time critical parts of the
code. However more recent experiences by some of my colleagues
are more positive,
Ian
| |
| Tim Prince 2007-04-26, 7:05 pm |
| Bart Vandewoestyne wrote:
> During some of the Fortran lab-sessions I teach, I let my
> students perform timings on certain algorithms. One of the
> timings they had to do is for different ways of doing
> matrix-multiplications, namely with 3 naive loops, with MATMUL
> and with the DGEMM BLAS routine. We used the CPU_TIME intrinsic
> to do our timings.
>
> During our lab-session where we ran the experiments on the
> departemental computers, we noticed how MATMUL was quite a bit
> faster than the naive implementation with 3 for-loops. This was
> also what i intuitively expected. But now one of my students
> mentions that on his own laptop the MATMUL timings are the same
> as the timings for the 3 for-loops (where the indices are chosen
> optimally considering the column-oriented ordering of Fortran).
>
> The question of the student was then if this could be possible.
> He wondered if the standard says anything about the
> *implementation* of MATMUL.
>
> My answer would be that the standard indeed doesn't specify
> anything about the *practical implementation* of MATMUL. So compiler
> writers are free to choose what algorithms they use to implement
> MATMUL. The fact that he notices almost no difference between
> MATMUL and the 3 for-loops is probably because his compiler
> (which he doesn't mension) doesn't use an optimal MATMUL
> implementation.
Or, he may have a laptop which doesn't include hardware support for the
optimized version of MATMUL supplied with the compiler.
Your lab setting looks like an ideal place to use gfortran-4.3
-fexternal-blas. This should allow you to link, static or dynamic,
against a full range of pre-built and your own implementations of ?GEMM.
You could drop all the general cases from your own ?GEMM and concentrate
on your own hardware platform and the case invoked by your test harness.
This gfortran feature recognizes the need to pick a fairly simple MATMUL
implementation (such as libgfortran provides) for moderate sized
problems, and a fancier version, such as ACML or MKL or Goto, with more
specific platform oriented optimizations, for larger problems.
| |
| Dick Hendrickson 2007-04-26, 7:05 pm |
| Bart Vandewoestyne wrote:
> During some of the Fortran lab-sessions I teach, I let my
> students perform timings on certain algorithms. One of the
> timings they had to do is for different ways of doing
> matrix-multiplications, namely with 3 naive loops, with MATMUL
> and with the DGEMM BLAS routine. We used the CPU_TIME intrinsic
> to do our timings.
>
> During our lab-session where we ran the experiments on the
> departemental computers, we noticed how MATMUL was quite a bit
> faster than the naive implementation with 3 for-loops. This was
Gee, I hope this is a typo. You mentioned "for-loops" several
times in the post. DO loops are much better. They're likely
to be better optimized on all processors. True, a simple
for-loop looks and acts pretty much like a DO loop. But, not
all compilers will recognize that.
There's a simple rule of thumb about when FOR loops are better
than DO loops for general purpose performance: "NEVER!"
At least that's my opinion ;).
Dick Hendrickson
> also what i intuitively expected. But now one of my students
> mentions that on his own laptop the MATMUL timings are the same
> as the timings for the 3 for-loops (where the indices are chosen
> optimally considering the column-oriented ordering of Fortran).
>
> The question of the student was then if this could be possible.
> He wondered if the standard says anything about the
> *implementation* of MATMUL.
>
> My answer would be that the standard indeed doesn't specify
> anything about the *practical implementation* of MATMUL. So compiler
> writers are free to choose what algorithms they use to implement
> MATMUL. The fact that he notices almost no difference between
> MATMUL and the 3 for-loops is probably because his compiler
> (which he doesn't mension) doesn't use an optimal MATMUL
> implementation.
>
> Any comments on this are welcome, i like to give decent feedback
> to my students :-)
>
> Thanks,
> Bart
>
| |
| Richard Maine 2007-04-26, 7:05 pm |
| Dick Hendrickson <dick.hendrickson@att.net> wrote:
> Gee, I hope this is a typo. You mentioned "for-loops" several
> times in the post. DO loops are much better....
Not to speak of the fact that there is no such thing as a "for loop".
The FOR construct is an array assignment; it is distinctly *NOT* a loop.
See multiple explanations of this in previous postings here.
Yes, there are cases where it can have the same end result as a loop,
but that doesn't make it a loop. Thinking of it as a loop will get you
in trouble and likely end up with you asking why it doesn't act like one
in other cases.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
| |
| Bart Vandewoestyne 2007-04-26, 7:05 pm |
| On 2007-04-26, Richard Maine <nospam@see.signature> wrote:
> Dick Hendrickson <dick.hendrickson@att.net> wrote:
>
>
> Not to speak of the fact that there is no such thing as a "for loop".
> The FOR construct is an array assignment; it is distinctly *NOT* a loop.
> See multiple explanations of this in previous postings here.
>
> Yes, there are cases where it can have the same end result as a loop,
> but that doesn't make it a loop. Thinking of it as a loop will get you
> in trouble and likely end up with you asking why it doesn't act like one
> in other cases.
Gee... I'm always amazed about how I carefully have to select my
words in this group in order to avoid starting unintentionally a
whole other sub-thread about a completely off-topic topic ;-)
So yes, I mentioned "for-loops" but I meant "do-loops". That
must have been my matlab-background influencing me there...
But anyway, I guess most readers understood what I meant. There
is only a very small minority here (isn't it, Richard ;-) who
enjoys to explain everything up to the very gory details :P
Regards,
Bart
--
"Share what you know. Learn what you don't."
| |
| glen herrmannsfeldt 2007-04-26, 7:05 pm |
| Bart Vandewoestyne wrote:
(snip on FOR loops and DO loops)
> Gee... I'm always amazed about how I carefully have to select my
> words in this group in order to avoid starting unintentionally a
> whole other sub-thread about a completely off-topic topic ;-)
> So yes, I mentioned "for-loops" but I meant "do-loops". That
> must have been my matlab-background influencing me there...
Mortran has both FOR and DO loops, with DO loops translating
to Fortran DO loops, and FOR loops translating to IF/GOTO
loops. The FOR test condition allows the increment and final
value to change anytime, and still give the right result.
(I am currently working on a project in Mortran.)
-- glen
| |
| Richard Maine 2007-04-26, 7:05 pm |
| Bart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote:
> On 2007-04-26, Richard Maine <nospam@see.signature> wrote:
[color=darkred]
> So yes, I mentioned "for-loops" but I meant "do-loops". That
> must have been my matlab-background influencing me there...
>
> But anyway, I guess most readers understood what I meant. There
> is only a very small minority here (isn't it, Richard ;-) who
> enjoys to explain everything up to the very gory details :P
I understand how such a mistake is easy to make, but...
I wouldn't count on most readers necessarily understanding for sure what
you meant, since there *ARE* many people who refer to FOR constructs as
loops. I didn't notice any replier who made it clear that they
unambiguously understood the reference. Two of them were on other
aspects and didn't mention that at all; Dick said that he hoped you
meant DO loops.
I'm not correcting it just to be pedantic. I'm correcting it because
people write code that doesn't work when they think of FOR constucts as
loops. And then they ask here why their code doesn't work.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
| |
| Wade Ward 2007-04-26, 7:05 pm |
|
"glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message
news:WKCdnchnmfexkazbnZ2dnUVZ_hudnZ2d@co
mcast.com...
> Bart Vandewoestyne wrote:
> (snip on FOR loops and DO loops)
>
>
>
> Mortran has both FOR and DO loops, with DO loops translating
> to Fortran DO loops, and FOR loops translating to IF/GOTO
> loops. The FOR test condition allows the increment and final
> value to change anytime, and still give the right result.
> (I am currently working on a project in Mortran.)
Migrating from for to do loops is what you do when you switch from C to
Fortran.
I was wondering about the terms in this elsethread post:
"I google and found lots of terms like loop interchange, loop tilting,
loop blocking, and many others. I want to know is this techniques
worth the effort."
What would, say, loop tilting look like in fortran?
--
WW
| |
| Craig Powers 2007-04-26, 7:05 pm |
| Richard Maine wrote:
> Bart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote:
>
>
>
>
> I understand how such a mistake is easy to make, but...
>
> I wouldn't count on most readers necessarily understanding for sure what
> you meant, since there *ARE* many people who refer to FOR constructs as
> loops. I didn't notice any replier who made it clear that they
> unambiguously understood the reference. Two of them were on other
> aspects and didn't mention that at all; Dick said that he hoped you
> meant DO loops.
>
> I'm not correcting it just to be pedantic. I'm correcting it because
> people write code that doesn't work when they think of FOR constucts as
> loops. And then they ask here why their code doesn't work.
It must be my exposure to multiple other languages, but I don't even
think of FOR being the same as FORALL. If the "ALL" isn't there, I'm
thinking of something different.
| |
| Wade Ward 2007-04-26, 7:05 pm |
|
"Craig Powers" <enigma@hal-pc.org> wrote in message
news:463137d4$0$597$a726171b@news.hal-pc.org...
> Richard Maine wrote:
>
> It must be my exposure to multiple other languages, but I don't even think
> of FOR being the same as FORALL. If the "ALL" isn't there, I'm thinking
> of something different.
In Fortran, for, as a control structure, is associated with "forall."
"Forall" sounds like a total pain. So, for purposes of control structures,
do as I do. I wonder how MM's converter treats this. Do I recall correctly
that MR&C calls forall antiquated?
--
WW
| |
| Michael Metcalf 2007-04-26, 10:03 pm |
|
"Wade Ward" <invalid@invalid.net> wrote in message
news:iOednXAe1LAfpqzbnZ2dnUVZ_uygnZ2d@co
mcast.com...
> In Fortran, for, as a control structure, is associated with "forall."
The for-loop is so well known from other languages that I cannot agree. The
for-loop was, IIRC, proposed for inclusion in Fortran 90.
> "Forall" sounds like a total pain. So, for purposes of control
> structures, do as I do. I wonder how MM's converter treats this.
It's not intended to covert anything other than source form plus some simple
syntax changes. As forall is a f90 statement, it would be flagged as unknown
and copied verbatim.
> Do I recall correctly that MR&C calls forall antiquated?
> --
You certainly do not!!!
Regards,
Mike Metcalf
| |
| Beliavsky 2007-04-26, 10:03 pm |
| On Apr 26, 7:24 pm, "Michael Metcalf" <michaelmetc...@compuserve.com>
wrote:
<snip>
> As forall is a f90 statement, it would be flagged as unknown
> and copied verbatim.
FORALL was introduced in Fortran 95, not Fortran 90.
| |
| Gary Scott 2007-04-26, 10:03 pm |
| Michael Metcalf wrote:
> "Wade Ward" <invalid@invalid.net> wrote in message
> news:iOednXAe1LAfpqzbnZ2dnUVZ_uygnZ2d@co
mcast.com...
>
>
>
>
> The for-loop is so well known from other languages that I cannot agree. The
> for-loop was, IIRC, proposed for inclusion in Fortran 90.
The for loop I'm familiar with is identical to a do loop (DO spelled
FOR), from VOS FORTRAN 77.
>
>
>
>
> It's not intended to covert anything other than source form plus some simple
> syntax changes. As forall is a f90 statement, it would be flagged as unknown
> and copied verbatim.
>
>
>
>
> You certainly do not!!!
>
> Regards,
>
> Mike Metcalf
>
>
--
Gary Scott
mailto:garylscott@sbcglobal dot net
Fortran Library: http://www.fortranlib.com
Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html
If you want to do the impossible, don't hire an expert because he knows
it can't be done.
-- Henry Ford
| |
| Tim Prince 2007-04-26, 10:03 pm |
| Michael Metcalf wrote:
>
>
> You certainly do not!!!
>
In fact, MR&C has much more favorable comments about forall than many
which have been posted here. So, I keep it in some benchmark suites,
even though it doesn't do anything special in practice, other than
enforce some restrictions which should avoid obstacles to optimization.
I do remember the FOR loops in certain varieties of extended f77, but
those were gone well before f90 compilers came on the scene, so the
situation doesn't confuse even me.
| |
| Jan Vorbrüggen 2007-04-27, 4:09 am |
| > What would, say, loop tilting look like in fortran?
Like elsewhere: it's either a typo or a pun.
Jan
| |
| Dick Hendrickson 2007-04-27, 7:05 pm |
| Wade Ward wrote:
> "glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message
> news:WKCdnchnmfexkazbnZ2dnUVZ_hudnZ2d@co
mcast.com...
> Migrating from for to do loops is what you do when you switch from C to
> Fortran.
>
> I was wondering about the terms in this elsethread post:
> "I google and found lots of terms like loop interchange, loop tilting,
> loop blocking, and many others. I want to know is this techniques
> worth the effort."
> What would, say, loop tilting look like in fortran?
> --
> WW
>
> Maybe
D
* O
* I
* =
* 1
* ,
* 9
It might also be a typo for loop tiling.
Dick Hendrickson
| |
| Paul van Delst 2007-04-27, 7:05 pm |
| Bart Vandewoestyne wrote:
> On 2007-04-26, Richard Maine <nospam@see.signature> wrote:
>
> Gee... I'm always amazed about how I carefully have to select my
> words in this group in order to avoid starting unintentionally a
> whole other sub-thread about a completely off-topic topic ;-)
Well, it's not really off topic if the words in question make the actual topic ambiguous.
> So yes, I mentioned "for-loops" but I meant "do-loops". That
> must have been my matlab-background influencing me there...
I have a bunch of software (some mine, some others) that is full of "do what I meant, not
what I said" code. As you can probably guess, none of it works quite right. Gotta love
the ol' plausibly wrong result. :o)
> But anyway, I guess most readers understood what I meant. There
> is only a very small minority here (isn't it, Richard ;-) who
> enjoys to explain everything up to the very gory details :P
And lucky we are to have them. Given the fields that most of the readers of clf are in --
sciences, some involving things that fly OR go boom -- the lack of correction of these
sorts of little nitty gritty details could unintentionally involve things that fly AND go
boom. :o(
cheers,
paulv
--
Paul van Delst Ride lots.
CIMSS @ NOAA/NCEP/EMC Eddy Merckx
| |
| Wade Ward 2007-04-27, 7:05 pm |
|
"Dick Hendrickson" <dick.hendrickson@att.net> wrote in message
news:a1pYh.82334$VU4.13604@bgtnsc05-news.ops.worldnet.att.net...
> Wade Ward wrote:
>
> D
> * O
> * I
> * =
> * 1
> * ,
> * 9
>
> It might also be a typo for loop tiling.
I like it when my whopping mistakes turn out to be funny. Do the words
interchange, tiling and blocking have relevance to fortran when the word
"loop" precedes?
--
WW
| |
| Dick Hendrickson 2007-04-27, 7:05 pm |
| Wade Ward wrote:
> "Dick Hendrickson" <dick.hendrickson@att.net> wrote in message
> news:a1pYh.82334$VU4.13604@bgtnsc05-news.ops.worldnet.att.net...
> I like it when my whopping mistakes turn out to be funny. Do the words
> interchange, tiling and blocking have relevance to fortran when the word
> "loop" precedes?
> --
> WW
>
>
I'm not sure about "tiling", that might be a synonym for "blocking".
Loop interchange is the process of interchanging loops (what else?)
usually to improve memory access patterns. Something like
do I = 1,N
do J = 1,M
A(I,J) = 0
enddo; enddo
becomes
do J = 1,M
do I = 1,N
A(I,J) = 0
enddo; enddo
because it marches through memory in Fortran's (natural?) column
order. usually, the compiler has to do some dependency analysis
to make sure that the transformation, which causes out-of-order
execution wrt the original, does no harm.
Loop blocking (to me) means the process of breaking up a loop
nest into chunks that are likely to fit nicely into the cache.
Two loops like
do I = 0,511
do j = 0,511
might be rewritten as
do II = 0,3
do I = 128*II, 128*(II+1)-1
do JJ = 0,3
do J = 128*JJ, 128*(JJ+1)-1
and then those loops would be interchanged to let the new
innermost two loops run over a 128X128 section of memory.
The hope being that those chunks fit well in the cache.
This is often done for matrix multiply like operations where
one subscript has to run the "wrong way" and thrashes the cache
for large enough matrices.
The other one you hear about is loop unrolling. It's obvious for
small loops Do I = 1,3: A(I)=0; enddo is unrolled into 3 simple
assignment statements. It's more useful for loops where the amount
of computation in the loop is comparable (or even less than) the
time taken for the index increment test and branch. Something
like
DO I = 1,N
A(I) = B(I) + C(I)
ENDDO
becomes
DO I = 1,N/2
A(I) = B(I) + C(I)
A(I+1) = B(I+1) + C(I+1)
ENDDO
(with some fooling around in case n is odd). Compilers are
pretty good at estimating execution and test times and unroll
enough to do the right thing. The unrolled loops can then
be interchanged or whatever.
Dick Hendrickson
| |
| Victor Eijkhout 2007-04-28, 7:04 pm |
| Bart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote:
> and with the DGEMM BLAS routine.
Insufficient information.
If you downloaded the blas from netlib, then this is the same as the
naive triply-nested loop implementation. On the other hand, if you have
mkl or whatever vendor library, than you should get close to processor
peak performance.
Victor.
--
Victor Eijkhout -- eijkhout at tacc utexas edu
| |
| glen herrmannsfeldt 2007-04-30, 4:09 am |
| Michael Metcalf wrote:
> "Wade Ward" <invalid@invalid.net> wrote in message
> news:iOednXAe1LAfpqzbnZ2dnUVZ_uygnZ2d@co
mcast.com...
[color=darkred]
If you write it as FOR ALL, which is legal in fixed form, then
I might agree. The keyword seems to be FORALL without a space.
There are some exceptions in free-form where two keywords are allowed
to be written without blanks between them, but I don't see an exception
that allows FORALL to be written as FOR ALL.
To me the connection to the FOR statement of many other
languages is weak.
-- glen
| |
| Wade Ward 2007-04-30, 7:06 pm |
|
"glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message
news:SeadnXKifu9eAajbnZ2dnUVZ_viunZ2d@co
mcast.com...
> Michael Metcalf wrote:
>
>
>
> If you write it as FOR ALL, which is legal in fixed form, then
> I might agree. The keyword seems to be FORALL without a space.
>
> There are some exceptions in free-form where two keywords are allowed
> to be written without blanks between them, but I don't see an exception
> that allows FORALL to be written as FOR ALL.
>
> To me the connection to the FOR statement of many other
> languages is weak.
I had a very incomplete notion of what the forall statement and construct
was, believing, among other things, that it was antiquated. Now that I've
pored over MR&C 6.9 for a w end, I would call forall a means by which
assignment in an array is optimized, especially for parallel processing. It
isn't an easy part of the language to understand.
--
WW
|
|
|
|
|