Home > Archive > Mathematica > November 2005 > Performance Improvement - Need help
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Performance Improvement - Need help
|
|
| Lee Newman 2005-11-17, 7:06 pm |
| Dear Group,
I am working with computational model that has a main loop which
executes about 10^6 to 10^7 times over the course of a simulation --
taking about 30 hrs. The bottleneck function (below) includes and outer
product and some matrix algebra. I have optimized it to the best of my
knowledge, but would desperately like to know if any further optimization
might be possible (including calling external functions in C or other
language). Any suggestions would be greatly appreciated.
FUNCTION ---------------------------------------------------------
UpdateSynapses = Compile[{{matrix, _Real, 2}, {vector1, _Real, 1},
{vector2,_Real, 1}, {thresh1, _Real}, {thresh2, _Real}, {C1, _Real},
{C2, _Real}, {maxval, _Real}},
Module[{coactivation},
coactivation = Outer[Times,
FloorZero[vector2-thresh2], FloorZero[vector1- thresh1]];
C2* maxval*coactivation + (1 - C2* coactivation - C1)*matrix
] (* end module *)
, {{FloorZero[__], _Real, 1}} ];
Notes:
(1) vector1 is 1x100; vector2 is 1x1500; matrix is 100x100; matrix2 is
100x1500; all vectors/matrices are comprised of reals (range 0 to 1)
and are packed.
(2) FloorZero=Compile[{{list, _Real, 1}}, UnitStep[list] * list].
Eliminating this
function does not significantly affect performance.
(2) run time ~ 30hrs for 10^7 iterations (Pentium 4, 2.8GHz, 1GB RAM)
Regards,
Lee Newman
| |
|
|
| Carl K. Woll 2005-11-19, 7:57 am |
| Lee Newman wrote:
> Dear Group,
>
> I am working with computational model that has a main loop which
> executes about 10^6 to 10^7 times over the course of a simulation --
> taking about 30 hrs. The bottleneck function (below) includes and outer
> product and some matrix algebra. I have optimized it to the best of my
> knowledge, but would desperately like to know if any further optimization
> might be possible (including calling external functions in C or other
> language). Any suggestions would be greatly appreciated.
>
> FUNCTION ---------------------------------------------------------
>
> UpdateSynapses = Compile[{{matrix, _Real, 2}, {vector1, _Real, 1},
> {vector2,_Real, 1}, {thresh1, _Real}, {thresh2, _Real}, {C1, _Real},
> {C2, _Real}, {maxval, _Real}},
>
> Module[{coactivation},
>
> coactivation = Outer[Times,
> FloorZero[vector2-thresh2], FloorZero[vector1- thresh1]];
>
> C2* maxval*coactivation + (1 - C2* coactivation - C1)*matrix
^
|
should be . not *, I think -----------------------------+
>
> ] (* end module *)
>
> , {{FloorZero[__], _Real, 1}} ];
>
> Notes:
> (1) vector1 is 1x100; vector2 is 1x1500; matrix is 100x100; matrix2 is
> 100x1500; all vectors/matrices are comprised of reals (range 0 to 1)
> and are packed.
> (2) FloorZero=Compile[{{list, _Real, 1}}, UnitStep[list] * list].
> Eliminating this
> function does not significantly affect performance.
> (2) run time ~ 30hrs for 10^7 iterations (Pentium 4, 2.8GHz, 1GB RAM)
>
> Regards,
> Lee Newman
Lee,
Some comments.
1. Use Clip[vector-thresh,{0,10}] instead of FloorZero. It's a bit
faster, and a bit clearer to me at least.
2. Your coactivation matrix can be thought of as the dot product of a
column vector and a row vector. In this light, the dot product of
coactivation.matrix can be thought of as
c . (r . matrix)
instead of
(c . r) . matrix
Now, the dot product of a vector with a matrix is usually much faster
than the dot product of a matrix with a matrix, so this ought to provide
some speed gain.
3. The only thing left to worry about is the 1-C1 part of the matrix
product (1-C1-C2 coactivation).matrix. Since coactivation is a 1500x100
matrix, 1-C1 is really a 1500x100 matrix where all entries are 1-C1. It
turns out that the (1-C1).matrix part is really just 1500 copies of
Total[m].
4. We end up with the outer product of a 1500 element column vector with
a 100 element row vector, and then to each row we add the same 100
element row vector. It turns out that instead of Outer, it's a bit
faster to use Map.
Putting the above ideas together, I came up with the following
uncompiled function:
update[m_, v1_, v2_, t1_, t2_, c1_, c2_, max_] :=
Module[{f1, f2, i1, i2},
f1 = c2 Clip[v1 - t1, {0, 10}];
f2 = Clip[v2 - t2, {0, 10}];
i1 = max f1 - f1.m;
i2 = (1 - c1)Total[m];
(i1# + i2 &) /@ f2]
Here is some test data:
SeedRandom[1];
m = Table[Random[], {100}, {100}];
v1 = Developer`ToPackedArray@Table[Random[], {100}];
v2 = Table[Random[], {1500}];
{t1, t2, c1, c2, max} = Table[Random[], {5}];
Let's make sure the matrices and vectors are packed:
In[9]:=
Developer`PackedArrayQ/@{m,v1,v2}
Out[9]=
{True, True, True}
Now, comparing update with UpdateSynapses:
In[10]:=
Do[r1=update[m,v1,v2,t1,t2,c1,c2,max],{1
00}]//Timing
Do[r2=UpdateSynapses[m,v1,v2,t1,t2,c1,c2
,max],{100}]//Timing
r1==r2
Out[10]=
{1.516 Second, Null}
Out[11]=
{5.078 Second, Null}
Out[12]=
True
At least on my slow machine, update is more than 3 times faster. If you
experience the same speedup, then it should take less than 10 hours.
Carl Woll
PS. The version of UpdateSynapses I used is:
UpdateSynapses=Compile[{
{matrix,_Real,2},
{vector1,_Real,1},
{vector2,_Real,1},
{thresh1,_Real},
{thresh2,_Real},
{C1,_Real},
{C2,_Real},
{maxval,_Real}
},
Module[{coactivation},
coactivation=Outer[
Times,
FloorZero[vector2-thresh2],
FloorZero[vector1-thresh1]
];
C2*maxval*coactivation+(1-C2*coactivation-C1).matrix],
{{FloorZero[__],_Real,1}}];
In[2]:=
FloorZero=Compile[{{list,_Real,1}},UnitS
tep[list]*list];
|
|
|
|
|