For Programmers: Free Programming Magazines  


Home > Archive > Fortran > September 2006 > Summarize array with respect to values in another...









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Summarize array with respect to values in another...
tripplowe@gmail.com

2006-09-19, 7:02 pm

Hey Folks,

Here's the deal...

I have a 1-dimensional array containing values I wish to summarize
using the values contained in another same-sized array. I have a third
array containing the unique values of the array I wish to summarize by.

This is how I'm doing it now:
....
real*4, dimension(1000):: datarray
integer*4, dimension(1000):: sumarray
integer*2, dimension(10):: sumvals/1,3,5,7,12,23,26,54,32,111/
real*4, dimension(10):: thestats
....
do k=1,10
thestats(k) = real(sum(datarray,mask=sumarray .eq. sumvals(k))) /
real(count(sumarray .eq. sumvals(k)))
enddo
....

Can ya'll think of a better/faster way to do this. In reality, I am
trying to summarize million-record arrays by ~3000 classes which takes
an extremely long time.

Thanks for your input.

glen herrmannsfeldt

2006-09-19, 7:02 pm

tripplowe@gmail.com wrote:

(snip)

> do k=1,10
> thestats(k) = real(sum(datarray,mask=sumarray .eq. sumvals(k))) /
> real(count(sumarray .eq. sumvals(k)))
> enddo


> Can ya'll think of a better/faster way to do this. In reality, I am
> trying to summarize million-record arrays by ~3000 classes which takes
> an extremely long time.


I would try it the old-fashioned way with nested loops, and then arrange
the loops such that the million entry loop is on the outside.

It isn't completely obvious what order to do them in, but order matters.

The one you have may generate one or two temporary arrays before doing
the actual sum.

-- glen

Richard E Maine

2006-09-19, 7:02 pm

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:

> tripplowe@gmail.com wrote:


>
[color=darkred]
> I would try it the old-fashioned way with nested loops...


I'd agree with Glenn here. A sufficiently smart compiler might be able
to figure out how to do this all efficiently, but without testing it, I
wouldn't bet on whether any particular compiler does or not.

My general advice (echoing that of many people before me) is to first
write code with an empasis on clarity, rather than spending a lot of
time trying to out-guess optimizers. Sometimes optimizers are "smarter"
than people realize, and attempts to hand-optimize can easily be counter
productive. I'd agree the above is reasonably clear.

But if you have a small piece of code that is having a noticeable
measured peformance impact, it can be worth trying the hand
optimizations. Of course, be sure to measure to make sure that your hand
optimization helps instead of hurts. This code snippet is quite small
enough to fiddle with, and Glenn's suggestion seems like an obvious
thing to try. It has the secondary benefit of likely being very portable
(in terms of both performance and getting the right answers), while it
is compiler-dependent how well the original is likely to perform.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
tripplowe@gmail.com

2006-09-19, 7:02 pm

Ok, the 'old-fashioned' way? I don't suppose you could elaborate just
a bit?
The only other way I can think of doing this is looping through the
data and using an if/then/else to calculate the means...

Thanks...

Dick Hendrickson

2006-09-19, 7:02 pm



tripplowe@gmail.com wrote:
> Ok, the 'old-fashioned' way? I don't suppose you could elaborate just
> a bit?
> The only other way I can think of doing this is looping through the
> data and using an if/then/else to calculate the means...
>
> Thanks...

My newsreader has lost your original post, so this is from
memory. I think you wanted to sum up a partial subset of an
array, where the subset element numbers were given by an
additional array. Try something like:

real :: the_array(1000)
integer :: list(10)
the_array = whatever
list = interesting_values
...
sum = 0
do I = 1,size(list)
x= x+ the_array(list(i))
enddo
average = x/size(list)

If I remember rightly, your case had two dimensional arrays,
but I think this can be easily generalized.

You could also do
x= sum(the_array(list))
which is the same as the above DO loop.

The key to both is using your selection array as a
subscript, rather than as a mask to select items.

Dick Hendrickson

glen herrmannsfeldt

2006-09-19, 7:02 pm

tripplowe@gmail.com wrote:
> Ok, the 'old-fashioned' way? I don't suppose you could elaborate just
> a bit?
> The only other way I can think of doing this is looping through the
> data and using an if/then/else to calculate the means...


Yes, loops and IF statements. In the end, the array expressions
need to be converted to loops, anyway. In some cases they should
be just about as efficient as explicit loops, in other cases they
will generate some temporary arrays, and fill them in a less efficient
order. It is usually fastest to order the loops such that array elements
are processed in the order they are stored in memory. With
complicated array expressions that may not happen, and even worse
temporary arrays might be used when they would not otherwise be
needed.

It shouldn't take too long to rewrite that as loops and do
some simple timing.

-- glen
tripplowe@gmail.com

2006-09-19, 7:02 pm

I bow to yall's foresight. Unraveling it, instead of using the
intrinsic SUM, reduced processing time to mere minutes.

Thank you very much for your help.
-Tripp

glen herrmannsfeldt wrote:
> tripplowe@gmail.com wrote:
>
> Yes, loops and IF statements. In the end, the array expressions
> need to be converted to loops, anyway. In some cases they should
> be just about as efficient as explicit loops, in other cases they
> will generate some temporary arrays, and fill them in a less efficient
> order. It is usually fastest to order the loops such that array elements
> are processed in the order they are stored in memory. With
> complicated array expressions that may not happen, and even worse
> temporary arrays might be used when they would not otherwise be
> needed.
>
> It shouldn't take too long to rewrite that as loops and do
> some simple timing.
>
> -- glen


Richard E Maine

2006-09-19, 7:02 pm

<tripplowe@gmail.com> wrote:

> I bow to yall's foresight. Unraveling it, instead of using the
> intrinsic SUM, reduced processing time to mere minutes.


Glad we (well, Glen gets most of the credit on this one) could help. But
for my part, and I suspect his also, I think it counts more as hindsight
than foresight. Been there before. My signature is relevant. :-)

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Jan Vorbrüggen

2006-09-20, 4:01 am

> thestats(k) = real(sum(datarray,mask=sumarray .eq. sumvals(k))) /
> real(count(sumarray .eq. sumvals(k)))


I suspect the sum and the count, which have identical expressions inside
them, will either result in the data being walked twice, or will generate
a temporary array that is then walked twice. Both of these options are bad
for performance. It would be interesting to see whether there are optimizers
around which are able to turn this into the equivalent IF-based loop.

Jan
tripplowe@gmail.com

2006-09-20, 8:01 am

If I may, I'd like to ask a follow-up question.

Lets say I am using the intrinsic SUM/MASK function on an array to
calculate the sum of values in 'myarray' for each value in 'catarray':

do i=1,10
thesum(i) = sum(myarray, mask=catarray .eq. i)
end

When this is compiled, is the intrinsic function unrolled to something
like the following?

do i=1,10
do k=1,size(myarray)
if (myarray(k) .eq. i) then
thesum(i) = thesum(i) + myarray(k)
endif
enddo
enddo

Esentially looping through all of the data 10 times???

If it matters, I'm using the Lahey 95 compiler.

tripplowe@gmail.com

2006-09-20, 8:01 am

If I may, I'd like to ask a follow-up question.

Lets say I am using the intrinsic SUM/MASK function on an array to
calculate the sum of values in 'myarray' for each value in 'catarray':

do i=1,10
thesum(i) = sum(myarray, mask=catarray .eq. i)
end

When this is compiled, is the intrinsic function unrolled to something
like the following?

do i=1,10
do k=1,size(myarray)
if (myarray(k) .eq. i) then
thesum(i) = thesum(i) + myarray(k)
endif
enddo
enddo

Esentially looping through all of the data 10 times???

If it matters, I'm using the Lahey 95 compiler.

Jan Vorbrüggen

2006-09-20, 8:01 am

> do i=1,10
> thesum(i) = sum(myarray, mask=catarray .eq. i)
> end
>
> When this is compiled, is the intrinsic function unrolled to something
> like the following?
>
> do i=1,10
> do k=1,size(myarray)
> if (myarray(k) .eq. i) then
> thesum(i) = thesum(i) + myarray(k)
> endif
> enddo
> enddo
>
> Esentially looping through all of the data 10 times???


That seems likely, yes.

If you have the constraint that the catarray only contains the values of
1 to 10, and you want to have the sum for each of these values, the correct
approach IMO would be to have a loop over myarray and the following state-
ment:

thesum(catarray(k)) = thesum(catarray(k)) + myarray(k)

Jan
tripplowe@gmail.com

2006-09-20, 7:01 pm

Thank you again. All of ya'll have been a great help.
-Tripp

Jan Vorbr=FCggen wrote:
>
> That seems likely, yes.
>
> If you have the constraint that the catarray only contains the values of
> 1 to 10, and you want to have the sum for each of these values, the corre=

ct
> approach IMO would be to have a loop over myarray and the following state-
> ment:
>=20
> thesum(catarray(k)) =3D thesum(catarray(k)) + myarray(k)
>=20
> Jan


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com