For Programmers: Free Programming Magazines  


Home > Archive > Fortran > February 2005 > dgemm subroutine in BLAS - I think I've cracked the difference, please confirm









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author dgemm subroutine in BLAS - I think I've cracked the difference, please confirm
Dawn Minnis

2005-02-12, 8:57 am

ok

Oh do you know I may have just got a blow to the head there. By any chance
does LDA,B,C define the size of one dimension of the BIG BIG BIG outer
matrix and m,n,k define the dimensions of the actual matrix to be worked
upon. And going on that, LDA, B, C explain what the jump is between one
element in row eg 3 and the next element (ie in the next col)?

Ok, if thats right then I get it. I was actually working on it the wrong
way round thinking that m,n,k defined the outer matrix and LDA,B,C defined
the inner size.

So I build a matrix first of all, being of size A=(LDA, k) B=(LDB, n) and
C=(LDC, n) and then only work with the submatrix A=(m, k) B=(k, n) and C=(m,
n).

ok, (as you can see I'm talking this out as I understand it so even I can
see what I'm talking about), so in the current state of memory allocation,
given that there is a function malloc() which I am using to allocate the
required memory for A,B and C, is there acually any need to define LDA,B,C
anymore? I mean, if you are wanting to perform multiplication of matrix A*B
then why bother defining a larger matrix to begin with that takes up
memory??

Btw, thanks for the humour throughout this thread, I've been at the point of
tearing my hair out for ages now and this really helps.

Dawn



"glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message
news:cujkbb$flh$1@gnus01.u.washington.edu...
> Dawn Minnis wrote:
>
>
>
>
> In the Fortran 66 days it was because dynamic allocation didn't exist, so
> you allocate arrays big enough for the largest problem you might run into.
> Say, though, that you want to read the array in. You could do something
> like:
>
> DOUBLE PRECISION A(100,101), B(101,100), C(102,100)
>
> READ(*,*) m, k, n, ((a(i,j),i=1,m),j=1,k),
> X ((b(i,j),i=1,k),j=1,n), ALPHA
> LDA=100
> LDB=101
> LDC=102
>
> CALL DGEMM ( 'N', 'N', M, N, K, ALPHA, A, LDA, B,
> X LDB, 0.D0, C, LDC )
>
> C note the subscript order in the next statement!
> WRITE(*,*) ((c(i,j),j=1,n),i=1,m)
>
>
> That will read in m, k, n and the two arrays a and b, and alpha, process
> them accordingly and print the result, for m and n less than or equal to
> 100, and k less than or equal to 101.
>
> Since the arrays are allocated before reading, it is unlikely that LDA,
> LDB, equal m, k, and with LDC=102, extremely unlikely
> that LDC will equal m. (The first subscript of C goes with m.)
>
> The arrays are read in the order they are stored in memory,
> but printed in the subscript reversed order, either of which could be
> changed by changing the implied DOs.
>
> Note that in this program m, k, n are read in the same statement where the
> arrays are read, which would make dynamic allocation difficult to do.
>
> -- glen
>



Ron Shepard

2005-02-12, 3:58 pm

In article <420dcfd8$0$48129$ed2619ec@ptn-nntp-reader02.plus.net>,
"Dawn Minnis" <dawn.minnis@btinternet.com> wrote:

> I mean, if you are wanting to perform multiplication of matrix A*B
> then why bother defining a larger matrix to begin with that takes up
> memory??


Adding allocate() to the fortran language certainly has changed the
way things are done. However, even so, it is still useful to be
able to operate with subblocks of matrices. There are many linear
algebra algorithms, for example, that are based on manipulations of
subblocks of matrices.

Another reason the blas were written that way is that there is no
performance penalty associated with adding LDA to the argument list.
That is, the programming interface could be designed with this extra
flexibility with no performance cost, so that design choice was a
win-win situation. On the other hand, allowing for arbitrary
increments between elements in a column of a matrix would have
resulted in a performance loss (or in very complicated code to
compensate for it when possible), so the designers choose not to go
in that direction. The cray library routine mxma() was an example
of a general matrix-matrix product routine that did allow this
flexibility, and on many machines it is very difficult to write an
mxma() replacement that runs efficiently. The fortran intrinsic
function matmul() also allows for this flexibility, and it sometimes
suffers in performance compared to the equivalent blas routine.

$.02 -Ron Shepard
James Giles

2005-02-14, 4:00 am

glen herrmannsfeldt wrote:
....
> Now that you know all about Fortran arrays and memory use, everything
> is completely different in C.
>
> First, C stores arrays with the rightmost subscript varying fastest,
> instead of leftmost as in Fortran. One reason is that it is closer
> to mathematical conventions, [...]


Mathematical conventions say nothing about "fastest varying" or
any such concept. The idea that left-most-fastest or right-most-fastest
is "more mathematical" is often repeated, but completely bogus.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Ron Shepard

2005-02-14, 4:00 am

User-Agent: MT-NewsWatcher/3.4 (PPC Mac OS X)
Date: Sun, 13 Feb 2005 23:33:45 -0600
Message-ID: <ron-shepard-07D16A.23334513022005@comcast.dca.giganews.com>
Lines: 47
NNTP-Posting-Host: 24.7.212.56
X-Trace: sv3- yF9XSowW558TqiK7TP+wO12YLXvq8yl6YDMj3T5e
tKQjlku5E7+3+F6PEyL4KbgWjtD9d4PJZKaEQ64!
q14NphtAj3vY8M2Q1SIX+Anbnjjzjtzfu7GDctLj
Az8+qrDWI8UcqEVmVjPHAmUCcRFUOBJrY1DO!VJH
nn35GgsNTGz4=
X-Complaints-To: abuse@comcast.net
X-DMCA-Complaints-To: dmca@comcast.net
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.23
Xref: number1.nntp.dca.giganews.com comp.lang.fortran:128350

In article
<usTPd.36098$Th1.23008@bgtnsc04-news.ops.worldnet.att.net>,
"James Giles" <jamesgiles@worldnet.att.net> wrote:

>
> Mathematical conventions say nothing about "fastest varying" or
> any such concept. The idea that left-most-fastest or right-most-fastest
> is "more mathematical" is often repeated, but completely bogus.


I would say it the other way. Fortran array indexing is closer to
the mathematical conventions than the C array indexing. For
example, consider the matrix product A times B (which is the subject
of this thread). It is natural to consider that as being equivalent
to a series of matrix-vector products of A times the columns of B.
Considering the column of B to be a "vector" is more consistent with
the fortran indexing convention than with the C convention (where a
row of B would be a vector).

In fact, C array indexing is so brain dead that most mathematical
routines don't even use it, they instead either duplicate the
fortran approach (offsets based on a single address), or they try to
fall back to a low-level scheme involving pointers to vectors or
pointers to pointers (e.g. Numerical Recipes), or they abandon the
mathematical conventions entirely and just work with the backward C
convention. It is only in the most recent version of C that
multidimensional arrays with variable dimensions are allowed, and if
I'm not mistaken, that indexing convention is *NOT* adopted by the
latest C++ standard. That feature is something that is pretty
fundamental to mathematical notation. Another important feature of
fortran array indexing is that the lower and upper bounds may be
specified with variables. If it is most natural to index an array
from -10 to +10, then that is the way you do it in fortran (for the
last 25 years, at least). In C, you are stuck with indexing arrays
of length N with indices from 0 to (N-1); it is hard to argue that
that is "closer to mathematical conventions" than the fortran
approach.

Fortran originally was an acronym for FORmula TRANslation, so there
is good reason for the close relation between standard mathematical
notation and fortran conventions. C was designed for system
programming, not mathematical operations, so its conventions were
adopted for other reasons.

$.02 -Ron Shepard
Ron Shepard

2005-02-14, 4:00 am

In article <d_6dnXmi_Nh1io3fRVn-vw@comcast.com>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> If you have a matrix with one row per line in a file, and columns
> on successive lines, and read it in using either list directed or
> formatted input in the natural Fortran ordering, which is row and which
> is column?


While this is true, it has nothing to do with DGEMM or with most
mathematical subroutines. A typical mathematical subroutine might
consist of dozens or hundreds of lines involving array subscripts
but not a single I/O operation on them. The C convention might
appear simpler for that rare I/O statement, but the fortran
convention is simpler for those dozens or hundreds of lines of code
that are actually doing the work. And if you are debugging the
code, is is almost certainly not an I/O operation that is the
problem.

I fairly seldom make mistakes getting fortran array indices
interchanged when I write code. I won't say it is never, but it is
rare. However, when I write C or Mathematica code, both of which
use the backward array index conventions, it is almost a coin toss
whether I get it right or not the first time. The reason is that it
just doesn't match up with the mathematics that I'm usually trying
to do. It is not just a matter of switching row and column indices,
it is also a question of which conventions the various subprograms
and built-in functions use. I've been programming C for over 20
years and Mathematica for about 15 years, so you might think I'd
catch on eventually, but the fact that I don't is evidence of a
serious design flaw in those languages for doing standard
mathematical operations.

$.02 -Ron Shepard
Dawn Minnis

2005-02-15, 9:00 pm

> Several reasons:
>
> - statically allocated arrays - now obsolete in most but not all places
> (think embedded systems)
> - padding for performance reasons - a memory subsystem might work much
> better
> if the stride is not a power of two, for instance (this is true on many
> Cray machines), so you would perform a 1024x1024 FFT on a 1025x1024
> array
> - there are algorithms, some recursive, that operate on smaller subarrays
> of the whole array. With the ability to supply any array element as the
> starting point for the subarray in an actual argument, this will work
> with
> the DGEMM interface even though there is no explicit provision for it.
>
> There are likely additional reasons...
>
> Jan


Mmmm, thank you. Lots of food for thought and I see I've instigated a
debate also. I think I have enough to go on to get started. Ultimately I
think I have a lot of background reading to do to determine why I am writing
the functions in a particular fashion. Think i might take a look at ATLAS
also, have so far stuck to looking at straightforward BLAS code from the
netlib site.

Thanks again

Dawn


beliavsky@aol.com

2005-02-16, 4:04 am

Dawn Minnis wrote:

> Mmmm, thank you. Lots of food for thought and I see I've instigated

a
> debate also. I think I have enough to go on to get started.

Ultimately I
> think I have a lot of background reading to do to determine why I am

writing
> the functions in a particular fashion. Think i might take a look at

ATLAS
> also, have so far stuck to looking at straightforward BLAS code from

the
> netlib site.


One thing DGEMM can do is compute

C = alpha*A*B + beta*C

where A, B, and C are matrices, A*B denotes matrix multiplication, and
alpha and beta are scalars.

In Fortran 90/95 one can just write

C = alpha*matmul(A,B) + beta*C

It's worth testing if an optimized DGEMM results in a significantly
faster program than the corresponding F90/95 array operations,
especially if the alternative is defenestration :).

Tim Prince

2005-02-16, 4:01 pm


"glen herrmannsfeldt" <gah@ugcs.caltech.edu> wrote in message
news:jqudncSxnsqlXI_fRVn-sQ@comcast.com...
> beliavsky@aol.com wrote:
> (snip)
>
>
>
>
>
>
>
> DGEMM does it without creating a temporary matrix, and presumably
> makes some restrictions on A, B, and C being separate matrices.
>
> Compilers may or may not create a temporary for the expression
> above, I would guess that they would, but it is hard to know.
>
> -- glen


It's hard to know what is done in "an optimized DGEMM" if a proprietary
version is meant. Temporary allocations aren't necessarily bad when
MATMUL() is involved. When the matrix is large enough that DGEMM is
efficient (e.g. 20x20), the cost of allocating some memory is relatively
negligible, unless it leads to an "out of memory" failure.


Dave Thompson

2005-02-21, 3:59 am

On Sun, 13 Feb 2005 23:33:45 -0600, Ron Shepard
<ron-shepard@NOSPAM.comcast.net> wrote:

> In article
> <usTPd.36098$Th1.23008@bgtnsc04-news.ops.worldnet.att.net>,
> "James Giles" <jamesgiles@worldnet.att.net> wrote:
>
<snip>[color=darkred]
> In fact, C array indexing is so brain dead that most mathematical
> routines don't even use it, they instead either duplicate the
> fortran approach (offsets based on a single address), or they try to
> fall back to a low-level scheme involving pointers to vectors or
> pointers to pointers (e.g. Numerical Recipes), or they abandon the
> mathematical conventions entirely and just work with the backward C
> convention. It is only in the most recent version of C that
> multidimensional arrays with variable dimensions are allowed, and if


In traditional C through C90 all bounds of an actual (defined =
allocated) array must be constant. For a dynamic (malloc'ed) array
(accessed by a pointer), or an array parameter (= dummy = formal) to a
function (= procedure) which is necessarily passed using a pointer
(effectively though not formally by reference) all dimensions except
the leftmost (= outermost in C as noted) must be constant. As I
believe has already been noted C arrays are not formally multidim
instead array-containing-array-etc., but except for a minor difference
in syntax this doesn't really change anything.

C99 allows (runtime) variable bounds for 'automatic' arrays in the C
meaning = allocated per function or block execution, not same as the
Fortran meaning; and for (automatic) pointers which includes
parameters. Statically _allocated_ arrays (that is at compile/link
time) still must have constant bounds, as is also true in Fortran --
it is impossible to use runtime data before runtime.

> I'm not mistaken, that indexing convention is *NOT* adopted by the
> latest C++ standard. That feature is something that is pretty


The latest C++ standard is the only formal one, C++98. 1998 is before
1999. Whether C++ will be revised to include the new features in C99,
of which VLA is only one, was still under discussion last I heard.

You could write in C++, and in fact I'd be astonished if someone and
probably many people haven't already, a class (or several) which
implements the equivalent of F9X assumed-shape and POINTERs by storing
bounds (and strides) and transparently computing the net subscript for
you all collapsed inline. It's not standard though, at least not yet.

> fundamental to mathematical notation. Another important feature of


I don't see how this is fundamental to _notation_. AFAICS mathematics
would be perfectly happy if all arrays were "allocated" as infinity or
maybe googol by same up to rank and just sliced down to M by N. Of
course it would be impossible to implement this on real machines, and
variable sizes are (more or less) important to nearly all serious
numeric computation. Maybe this is just terminology.

> fortran array indexing is that the lower and upper bounds may be
> specified with variables. If it is most natural to index an array
> from -10 to +10, then that is the way you do it in fortran (for the
> last 25 years, at least). In C, you are stuck with indexing arrays
> of length N with indices from 0 to (N-1); it is hard to argue that
> that is "closer to mathematical conventions" than the fortran
> approach.
>

Well, that both bounds may be specified at all is one feature, and
that it/they may be variable (except for statically-allocated = SAVEd
or COMMON) is another. Pascal did both bounds but only constants (with
some relaxation later IIRC equivalent to assumed shape). PL/I did both
bounds, variable in at least the cases I used, and optionally assumed
(but only for all dimensions). Ada does both bounds, potentially
variable, but you (usually?) have to introduce an extra type name for
it. See also above about C++.

> Fortran originally was an acronym for FORmula TRANslation, so there
> is good reason for the close relation between standard mathematical
> notation and fortran conventions. C was designed for system
> programming, not mathematical operations, so its conventions were
> adopted for other reasons.
>
> $.02 -Ron Shepard


- David.Thompson1 at worldnet.att.net
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com