Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Derived data types vs computation time
Derived data types are one of the more useful new features of modern
Fortran.  However, a few months ago I had to revert critical sections
of one program back to separate arrays when I discovered that accesses
to derived type components were heavily impacting computation times
(IIRC replacing with a few standard arrays reduced the computation
time for a large section of code by a factor of about 5).

At the time I was a bit "annoyed" but did not take too much notice -
just please to significantly reduce the computation time.

But I am now finding the experience is making me reluctant to use
derived types anywhere near computationally intensive sections of code
(or even moderately intensive code).  Frustrating because the
alternative of multiple separate arrays can be cumbersome, untidy, and
harder to follow.  The latter problems can be avoided with suitable
array naming- eg:  site_ID, site_name, site_lat, site_long,
site_elevation, etc. (indeed is more legible than the silly, often
near illegible component selector "%" used for derived types.)
Separate arrays are not so convenient though in other situations - eg:
site(i) = tempsite; and READ(line,*) site(i), become a lot messier
with separate arrays, especially where there are multiple components.

The main reason for this post is to find out whether the computational
deficiencies I encountered with derived types are typical, or compiler
specific (ivf8), or only occur under specific circumstances (mixed
type components, etc)?  Any suggestions?

David







Report this thread to moderator Post Follow-up to this message
Old Post
dud_address@xparadise.net.nz
10-05-04 09:04 AM


Re: Derived data types vs computation time
dud_address@xparadise.net.nz wrote:

> Derived data types are one of the more useful new features of modern
> Fortran.  However, a few months ago I had to revert critical sections
> of one program back to separate arrays when I discovered that accesses
> to derived type components were heavily impacting computation times
> (IIRC replacing with a few standard arrays reduced the computation
> time for a large section of code by a factor of about 5).

> At the time I was a bit "annoyed" but did not take too much notice -
> just please to significantly reduce the computation time.

The cache on modern processors assumes locality of reference,
that is, if you reference one data item you will likely soon
reference the following data item.  If you reference one part
in an array of structures in a computation, but not others then
it might slow down by a large factor.

You might be able to use a structure of arrays, though that
might not help so much.  There are stories of C compilers on
some SPEC benchmarks that internally convert between array
of structures and structure of arrays, though it is not
normally allowed by the standard.  (With some kinds of pointer
arithmetic you can tell the difference.  The compiler has to
know that the program doesn't do that.)

It might be that you can copy to another array, do the
computation, and copy back faster than in the structure.

-- glen


Report this thread to moderator Post Follow-up to this message
Old Post
glen herrmannsfeldt
10-05-04 09:04 AM


Re: Derived data types vs computation time
On Tue, 05 Oct 2004 14:31:41 +1300,
dud_address@xparadise.net.nz <dud_address@xparadise.net.nz>
wrote in <ssq3m0h385guifna0f7trqemgc4gos2j04@4ax.com>:

> The main reason for this post is to find out whether the computational
> deficiencies I encountered with derived types are typical, or compiler
> specific (ivf8), or only occur under specific circumstances (mixed
> type components, etc)?  Any suggestions?

One thing to be aware of with many types of aggregates (derived
types, common blocks, etc) is that the order of the components may be
important.  Specifically if an entity is at an unsuitable address boundary
(say, a 4-byte integer at an address that's not divisible by 4), some
modern processors take a performance penalty to retrieve the data, e.g.
make two separate reads and then reconstruct the misaligned object.  One
should strive to order the members in descending order of size to allow
them to be always aligned on their "natural" boundaries.

Similarly, if the aggregate's size is not a multiple of its largest
member's, you should not force arrays to be packed in memory but allow the
compiler to start each element on its natural boundary.


--
Ivan Reid, Electronic & Computer Engineering,     ___     CMS  Collaboration
,
Brunel University.     Ivan.Reid@brunel.ac.uk             Room 40-1-B12, CER
N
KotPT -- "for stupidity above and beyond the call of duty".

Report this thread to moderator Post Follow-up to this message
Old Post
Dr Ivan D. Reid
10-05-04 09:03 PM


Re: Derived data types vs computation time
> There are stories of C compilers on some SPEC benchmarks that
> internally convert between array of structures and structure of
> arrays, though it is not normally allowed by the standard.

That would be the Sun C compiler mangling art, which is probably
the worst element of SPEC CFP2000 in any case. Apart from the
whole-program analysis required to re-write it a-of-s to s-of-a,
it also does coalescing of malloc() calls, which apparently also
gains a substantial factor.

Jan

Report this thread to moderator Post Follow-up to this message
Old Post
Jan Vorbrüggen
10-05-04 09:03 PM


Re: Derived data types vs computation time
> The main reason for this post is to find out whether the computational
> deficiencies I encountered with derived types are typical, or compiler
> specific (ivf8), or only occur under specific circumstances (mixed
> type components, etc)?  Any suggestions?

Some considerations:

- Alignment: depending on the order of components and the specification,
or not, of SEQUENCE in the definition, the compiler may or may be forced
to generate slow code for accessing some or all components. General
rule: order by size, larger items first.

- A second alignment issue might occur when the total size does not fit
well with the alignment required by the hardware - this can happen in
arrays of structures when, for instance, the second array element is
mis-aligned because of the size of the first element. Padding to some
suitable total size can help here. Also, adverse interactions with
cache line sizes or replacement policies can occur.

- If you access most or all of a structures components in one basic block,
an array of structures will usually be best. If, on the other hand, you
first access multiple - perhaps even consecutive - elements of one com-
ponent, a structure of arrays usually will be best.

- I agree that there is this problem of handling one element's bits and
pieces in the case of a structure of arrays. Perhaps you can build two
helper functions that, given an element number, scatter and gather the
components to and from the arrays...sort of achieving the best of both
worlds.

In the end, it's a matter of performance measurement to decide between
the two approaches.

Jan

Report this thread to moderator Post Follow-up to this message
Old Post
Jan Vorbrüggen
10-05-04 09:03 PM


Re: Derived data types vs computation time
On Tue, 05 Oct 2004 14:31:41 +1300, dud_address@xparadise.net.nz wrote:

>The main reason for this post is to find out whether the computational
>deficiencies I encountered with derived types are typical, or compiler
>specific (ivf8), or only occur under specific circumstances (mixed
>type components, etc)?  Any suggestions?

You have not provided examples of what you are referring to, so I am not sur
e
of the context.  I would not expect any noticeable performance penalty for
accessing components of a derived type variable.  If you have a specific
example that shows a problem in an Intel compiler, we'd love to see it at
Intel Premier Support.  One issue we are aware of is operations involving
pointer or allocatable array components, where sometimes unnecessary array
copies are made.

Steve Lionel
Software Products Division
Intel Corporation
Nashua, NH

User communities for Intel Software Development Products
http://softwareforums.intel.com/
Intel Fortran Support
http://developer.intel.com/software/products/support/

Report this thread to moderator Post Follow-up to this message
Old Post
Steve Lionel
10-05-04 09:03 PM


Re: Derived data types vs computation time
Jan Vorbrüggen wrote:

> That would be the Sun C compiler mangling art, which is probably
> the worst element of SPEC CFP2000 in any case. Apart from the
> whole-program analysis required to re-write it a-of-s to s-of-a,
> it also does coalescing of malloc() calls, which apparently also
> gains a substantial factor.

How can that be ?  SPEC programs are supposed to spend only a small
fraction of their time in libraries not supplied (and thus compiled)
with the program.

--
Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
Maintainer, GNU Fortran 77: http://gcc.gnu.org/onlinedocs/g77_news.html
A maintainer of GNU Fortran 95: http://gcc.gnu.org/fortran/

Report this thread to moderator Post Follow-up to this message
Old Post
Toon Moene
10-06-04 01:58 AM


Re: Derived data types vs computation time
On Tue, 05 Oct 2004 09:55:49 -0400, Steve Lionel
<Steve.Lionel@REMOVEintelME.com> wrote:

>On Tue, 05 Oct 2004 14:31:41 +1300, dud_address@xparadise.net.nz wrote:
> 
>
>You have not provided examples of what you are referring to, so I am not su
re
>of the context.  I would not expect any noticeable performance penalty for
>accessing components of a derived type variable.  If you have a specific
>example that shows a problem in an Intel compiler, we'd love to see it at
>Intel Premier Support.  One issue we are aware of is operations involving
>pointer or allocatable array components, where sometimes unnecessary array
>copies are made.
>
>Steve Lionel
>Software Products Division
>Intel Corporation
>Nashua, NH
>
>User communities for Intel Software Development Products
>  http://softwareforums.intel.com/
>Intel Fortran Support
>  http://developer.intel.com/software/products/support/


Thanks for the various replies.

In respect of "examples", the actual code in question is as below.  Do
not seem to any alignment issues?

The attraction of derived types is the ability to hang related data
together in a rational manner; eg. in this case EQ date & time,
location, depth, magnitude, etc.  It would be nice if I can use them
without significant impact on computation time; eg  data for a given
event (i) can easily be read via  READ(line,*) EQ(i) - not so simple
with a structure of arrays?

Steve, in respect of Intel Premier Support, my experience with that to
date is not encouraging.  This is probably not the place to go into
detail, so I will send you an email directly.

David



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subroutine code below

NB: This is an internal subroutine, hence not all variables declared
here - only first half of subroutine below - should be adequate to
provide context.

As indicated in my original post, substituting the
<<<< OFFENDING LINE >>>> with

rsq = (EQ_east(i)-znx(j))**2 + (EQ_north(i)-zny(j))**2

had a large impact on the time for this code to run - despite the
"log10"  & "EXP" in the rest of the code.

Compiler is ivf8.05.  Because there is a tight deadline on this work,
I am very reluctant to change to 8.1 midstream, especially as a number
of people seem to have had problems.  For the same reason, I am
holding back on applying SP2 for WinXP.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SUBROUTINE GAUSSIAN ( EQ, nEQ, znx, zny, nG0, nG1, gridA, gridB,
cSmooth, rmax )

IMPLICIT NONE
integer i, j
integer na, nb
real    pi

! EQ Catalogue data
integer nEQ

! EQ source zones
integer nG0, nG1
real    znx(*), zny(*)

!  EQ model variables
real    cSmooth, csq, const1
real    rmax, rmax1, rmax2, rsq
real    sumxy_b, sumx_b, sumy_b, sumxsq_b
real    sumeqs, sumx_a, sumy_a
real    x, y, z, eqs, freq

real    gridA(nG1), gridB(nG1)

TYPE :: eqRec
integer year
integer month
integer day
integer time
real    x
real    y
integer depth
real    mag
real    mSD
END TYPE eqRec
TYPE(eqRec) EQ(nEQ)
!=====================================

! Model constants
rmax1 = rmax          ! limit on extent of spreading
rmax2 = rmax**2
csq = cSmooth*cSmooth ! SQ of smoothing parameter
pi = 3.1415926535
const1 = csq*pi

loopGrid: DO j = nG0, nG1 ! Loops source grid
sumxy_b  = 0.0
sumx_b   = 0.0
sumy_b   = 0.0
sumxsq_b = 0.0
freq     = 0.0
sumeqs = 0.0
sumx_a = 0.0
sumy_a = 0.0
na = 0
nb = 0

loopEQ: do i = 1, nEQ

!  <<<< OFFENDING LINE >>>>
rsq = (EQ(i)%x-znx(j))**2 + (EQ(i)%y-zny(j))**2
!  <<<< OFFENDING LINE >>>>

if(rsq > rmax2) CYCLE loopEQ
! probably c. 10% of instances pass this test

nb = nb+1
x = EQ(i)%mag
z = trec
freq = freq + 1./z
y = log10(freq)
sumxy_b = sumxy_b + x*y
sumx_b = sumx_b + x
sumy_b = sumy_b + y
sumxsq_b = sumxsq_b + x*x

eqs = (EXP(-rsq/csq))/const1/z
if (eqs < 0.1E-20) CYCLE loopEq

sumeqs = sumeqs + eqs
y = log10(sumeqs)
na = na + 1
sumy_a = sumy_a + y
sumx_a = sumx_a + x
end do loopEQ
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




Report this thread to moderator Post Follow-up to this message
Old Post
dud_address@xparadise.net.nz
10-06-04 08:59 AM


Re: Derived data types vs computation time
>> That would be the Sun C compiler mangling art, which is probably 
> How can that be ?  SPEC programs are supposed to spend only a small
> fraction of their time in libraries not supplied (and thus compiled)
> with the program.

From the standard's point of view, malloc() is part of the language,
just as ALLOCATE is part of the Fortran language. I think the formal
restriction is that something 98 % of a SPEC CPU program should be in
user mode.

Jan

Report this thread to moderator Post Follow-up to this message
Old Post
Jan Vorbrüggen
10-06-04 08:59 AM


Re: Derived data types vs computation time
>                !  <<<< OFFENDING LINE >>>>
>       rsq = (EQ(i)%x-znx(j))**2 + (EQ(i)%y-zny(j))**2
>                 !  <<<< OFFENDING LINE >>>>
>
>       if(rsq > rmax2) CYCLE loopEQ
>                ! probably c. 10% of instances pass this test

This is one of the situations I described: You are accessing only a
small part of your structure EQ(i) here - two consecutive elements -
in 90% of the cases, if I understand your comment correctly. Given
that on any access to memory that is not already cached, a cache line's
worth of data - typically 32-64 bytes - will be fetched, in the case
above you are fetching the whole structure (approximately), of which
only a quarter is being used. In the alternative case, one fetch will
load consecutive elements of the _north and _east arrays, which will
be used on the next three turns through the loop. Given that your program
is limited by bandwidth to memory, for a 32-byte cache line expect a
speedup of 4, and for a 64-byte cache line a speedup of 4.5 - which seems
to fit with what you have observed.

Given this access pattern, do the following: Define two structure types,
EQ_coord and EQ_data, the first containing the current components x and
y, and the second containing the other current structure elements. That
should give the performance of the structure of arrays, while having much
less handling issues - you just need to read, write, copy... two items
instead of a single one.

Jan

Report this thread to moderator Post Follow-up to this message
Old Post
Jan Vorbrüggen
10-06-04 08:59 AM


Sponsored Links




Last Thread Next Thread Next
Pages (3): [1] 2 3 »
Search this forum -> 
Post New Thread

Fortran archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:51 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.