Home > Archive > Fortran > November 2004 > Extremely slow ALLOCATABLE array
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Extremely slow ALLOCATABLE array
|
|
| robert.funnellNOSPAM@NOSPAMmcgill.ca 2004-11-29, 4:07 pm |
| I have a case in which use of an ALLOCATABLE array is much much slower
than using a static array. The following is a small test programme
that demonstrates the problem:
IMPLICIT NONE
INTEGER, PARAMETER :: NN=860
INTEGER, PARAMETER :: NPLANES=1
INTEGER, PARAMETER :: NPIXH=NN
INTEGER, PARAMETER :: NPIXV=NN
INTEGER, PARAMETER :: NSLICE=NN
INTEGER IROW,ICOL,ISLICE,IPLANE,ISTATUS
INTEGER(1),ALLOCATABLE :: STACK_XZ(:,:,:,:)
C
INTEGER(1) :: PIX_XZ(NPLANES,NPIXH,NSLICE)
C
c INTEGER(1),ALLOCATABLE :: PIX_XZ(:,:,:)
c ALLOCATE(PIX_XZ(NPLANES,NPIXH,NSLICE),ST
AT=ISTATUS)
c IF(ISTATUS.NE.0) STOP 1
C
ALLOCATE(STACK_XZ(NPLANES,NPIXH,NSLICE,N
PIXV),STAT=ISTATUS)
IF(ISTATUS.NE.0) STOP 2
C
OPEN(UNIT=0,CARRIAGECONTROL='FORTRAN')
C
WRITE(0,'('' Copying'',I5,'' images''/X)') NPIXV
DO IROW=1,NPIXV
IF(MOD(IROW,10).EQ.0) WRITE(0,'(''+'',I5)') IROW
c PIX_XZ=STACK_XZ(:,:,:,IROW)
DO ISLICE=1,NSLICE
DO ICOL=1,NPIXH
DO IPLANE=1,NPLANES
PIX_XZ(IPLANE,ICOL,ISLICE)=
* STACK_XZ(IPLANE,ICOL,ISLICE,IROW)
END DO
END DO
END DO
END DO
C
STOP
C
END
In the form shown here, with a static array, the run time is too short
to measure with a watch; if I comment out the static declaration of
PIX_XZ and uncomment the ALLOCATABLE declaration and its ALLOCATE, the
run time is ~13 seconds. This is on an Alpha with 768 MB using Compaq
Fortran V1.2.0 under Linux.
On an Intel laptop with 512 MB and CVF 6.6.C, the allocatable version
gets to row 550 out of 860 in about half a second then takes over 30
seconds to get to 860, paging heavily; the static version finishes in
much less than half a second. The results are about the same if I use
the commented line PIX_XZ=STACK_XZ(:,:,:,IROW) rather than the
explicit DO loops.
I don't understand why the allocatable array should be so much slower.
Can anyone offer insight? Is this something that would be fixed if I
updated my compilers?
This may be a FAQ, in which case I apologise, but I haven't found the
answer anywhere, and what I'm observing seems to be more extreme than
what I've seen described. The original context is handling CT-scan
data with up to 1024 slices, each being 1024x1024.
- Robert
| |
| glen herrmannsfeldt 2004-11-29, 4:07 pm |
| robert.funnellNOSPAM@NOSPAMmcgill.ca wrote:
> I have a case in which use of an ALLOCATABLE array is much much slower
> than using a static array. The following is a small test programme
> that demonstrates the problem:
(snip of program)
> In the form shown here, with a static array, the run time is too short
> to measure with a watch; if I comment out the static declaration of
> PIX_XZ and uncomment the ALLOCATABLE declaration and its ALLOCATE, the
> run time is ~13 seconds. This is on an Alpha with 768 MB using Compaq
> Fortran V1.2.0 under Linux.
I don't see that you ever initialize the array. That shouldn't
matter, but I have heard stories where it did.
In one story a very large dynamically allocated array in a C
program had all its memory mapped to a single page by the
memory management system. The system then allocates a real
page for it when data is actually written to it. There is
a possibility that a similar case exists here.
Otherwise, it seems that you are going through the loops
with the leftmost subscript varying fastest, as one is expected
to do in Fortran. I believe that is still true for
ALLOCATABLE arrays.
> On an Intel laptop with 512 MB and CVF 6.6.C, the allocatable version
> gets to row 550 out of 860 in about half a second then takes over 30
> seconds to get to 860, paging heavily; the static version finishes in
> much less than half a second. The results are about the same if I use
> the commented line PIX_XZ=STACK_XZ(:,:,:,IROW) rather than the
> explicit DO loops.
I already snipped the code, so hopefully other readers can see
it in the original post if they need it.
The array is dimensioned (1,860,860,860) of, I will assume
one byte integers. That is over 600 megabytes, so one should
not be surprised to see a lot of paging.
> I don't understand why the allocatable array should be so much slower.
> Can anyone offer insight? Is this something that would be fixed if I
> updated my compilers?
I am suspicious that your static case is not really doing
a real copy. If you really have two 600MB arrays allocated
you should be getting a lot of swapping in both cases.
The case I usually worry about slowing down dynamic arrays
is allocate/deallocate inside a loop.
Please try running it with initialization of the arrays.
Also, what OS are you running this on?
> This may be a FAQ, in which case I apologise, but I haven't found the
> answer anywhere, and what I'm observing seems to be more extreme than
> what I've seen described. The original context is handling CT-scan
> data with up to 1024 slices, each being 1024x1024.
-- glen
| |
| robert.funnellNOSPAM@NOSPAMmcgill.ca 2004-11-29, 4:07 pm |
| On Sun, 28 Nov 2004, glen herrmannsfeldt wrote:
> ...
> I don't see that you ever initialize the array. That shouldn't
> matter, but I have heard stories where it did.
> ...
I inserted the lines PIX_XZ(:,:,:)=0 and STACK_XZ(:,:,:,:)=0 .
After a significant delay to initialise the big array (STACK_XZ) the
subsequent run times were about the same as before, for both static
and ALLOCATABLE forms of the programme on my Alpha.
> ...
> The array is dimensioned (1,860,860,860) of, I will assume
> one byte integers. That is over 600 megabytes, so one should
> not be surprised to see a lot of paging.
> ...
> I am suspicious that your static case is not really doing
> a real copy. If you really have two 600MB arrays allocated
> you should be getting a lot of swapping in both cases.
> ...
Actually only one of the arrays is 600 MB, the other one is only
860x860 rather than 860x860x860.
But your suspicion turns out to be correct. When I inserted a WRITE
statement at the end, to output a few elements of PIX_XZ, both static
and ALLOCATABLE forms of the programme slowed down, and their run
times are now about equal.
I saw a reference to such a thing somewhere but thought it must have
been facetious. I guess not. This test programme was distilled from a
real programme which was horribly slow and which I was hoping to speed
up. I guess I distilled too much. Unfortunately I was left with the
water rather than the whiskey :-(
Thanks for the nudge in the right direction.
- Robert
| |
| Arjen Markus 2004-11-29, 4:07 pm |
| robert.funnellNOSPAM@NOSPAMmcgill.ca wrote:
>
>
> But your suspicion turns out to be correct. When I inserted a WRITE
> statement at the end, to output a few elements of PIX_XZ, both static
> and ALLOCATABLE forms of the programme slowed down, and their run
> times are now about equal.
>
Maybe this is a case where the compiler decided it could "optimise"
a few statements and leave out a lot of processing as a result!
Adding the writes then upsets that kind of optimisation, forcing it
to do what it should do.
> I saw a reference to such a thing somewhere but thought it must have
> been facetious. I guess not. This test programme was distilled from a
> real programme which was horribly slow and which I was hoping to speed
> up. I guess I distilled too much. Unfortunately I was left with the
> water rather than the whiskey :-(
>
> Thanks for the nudge in the right direction.
>
> - Robert
Just a thought:
if you were to turn the last dimension into a dimension of a structure,
then you would split up the big array into pieces. Perhaps the compiler
can then make a better judgment of the access patterns:
type BIGARRAY
real, dimension(:,:,:), pointer :: array
endtype
type(BIGARRAY), dimension(:), pointer :: full_data
allocate( full_data(1:1000)
do i = 1,1000
allocate( full_data(i)%array(100,100,100) )
enddo
(Just an illustration)
It may work, it may even be worse.
Regards,
Arjen
|
|
|
|
|