Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: Array optimizing problem in C++?
On Sun, 23 Mar 2008 10:23:07 -0700 (PDT), courpron@gmail.com wrote:

>cl /O2 /GL prog.cpp /link /ltcg

And there was no speed difference between

cl /O2 /GL prog.cpp /link /ltcg

and  cl /O2 /prog.cpp  in this case.



Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Sun, 23 Mar 2008 13:22:59 -0500, Razii
<DONTwhatevere3e@hotmail.com> wrote:

>And there was no speed difference between
>
>cl /O2 /GL prog.cpp /link /ltcg

And the time I am getting for (VC++) is

Time smooth(): 5391 ms
Time smooth(): 5375 ms
Time smooth(): 5453 ms
Time smooth(): 5421 ms

For Java

C:\>java -server Arrays
Time smooth(): 5610 ms

C:\>java -server Arrays
Time smooth(): 5625 ms

C:\>java -server Arrays
Time smooth(): 5562 ms

Note that I am using server VM (-server). The client VM is a bit
slower in this case

After changing double to int,

(vc++ with cl /O2 /GL prog.cpp /link /ltcg)

Time smooth(): 1000 ms
Time smooth(): 1000 ms
Time smooth(): 1000 ms

and

C:\>java -server Arrays
Time smooth(): 2541 ms


Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
jason.cipriani@gmail.com wrote:
> If you are going for blinding speed be sure to use proper optimization
> flags; otherwise you are only calculating benchmarks for poorly
> optimized code, which, like most of your benchmarks, is meaningless:
>
> $ g++ -O0 smooth.cpp
> 9884.867 ms
>
> $ g++ -O3 smooth.cpp
> 8791.123 ms
>
> $ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math smooth.cpp
> 1207.944 ms

GCC's -ffast-math option breaks semantics, so it is not a valid
optimization. I would not use it in a comparison against Java unless you
know that accuracy is not required for the given task.

> $ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math -funroll-
> loops smooth.cpp
> 1084.385 ms
>
> This is your code slightly modified to use QueryPerformanceCounter for
> timing (using Windows). Most of the speed up is a result of -ffast-
> math.
>
> Based on the quality and rigor of the other tests I've seen you do,
> I'd guess that you missed a few optimization flags on the Java side as
> well.

I have altered all implementations of this benchmark to perform progressive
blurring from one array to the other and back rather than overwriting the
destination array repeatedly. This slightly improved performance in all
cases.

On my Athlon 64 X2 4400+ (2.2Ghz) running AMD64 Debian Linux, I get
best-of-three results:

g++:   4.080  g++ -O3 bench.cpp -o bench; time ./bench
g++2:  4.080  g++ -O3 bench2.cpp -o bench2; time ./bench2
ocaml: 4.048  ocamlopt bench.ml -o bench; time ./bench
java:  4.016  javac Arrays.java; time java Arrays

On the same hardware running 32-bit Windows XP Pro, I get:

F#:    3.970  fsc -O3

This shows an insignificant difference in performance which, if nothing
else, is interesting because the second C++ program uses bounds checking. I
was also surprised to see OCaml do so well on this benchmark because it
makes no attempt to hoist bounds checks. F# on .NET typically beats all
open source compilers on Linux on this kind of benchmark (float array
intensive). C# presumably would too.

The bench2 program is the following, more idiomatic, C++ version that uses
bounds checking:

#include <vector>
#include <iostream>
#include <cstdlib>
#include <ctime>

void smooth (std::vector<double> &dest, std::vector<double> &src)
{
for (int i = 1; i < src.size() - 1; i++)
dest.at(i - 1) = (src.at(i - 1) + src.at(i) + src.at(i + 1)) / 3 ;
}

void fill (std::vector<double> &src)
{
srand((unsigned)std::time(0));

for (int i = 0; i < src.size(); ++ i )
src.at(i) = rand();
}

int main()
{
const int len = 50000;

std::vector<double> src_array(len);
std::vector<double> dest_array(len);

fill(src_array);

clock_t start=clock();

for (int i = 0; i < 5000; i++)
{
smooth(dest_array, src_array);
smooth(src_array, dest_array);
}

clock_t endt=clock();

std::cout <<"Time smooth(): " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

return 0;
}

Here is my OCaml equivalent:

let smooth dst src =
let n = Array.length dst in
for i=1 to n-2 do
dst.(i-1) <- (src.(i-1) +. src.(i) +. src.(i+1)) /. 3.
done

let () =
let n = 50000 in
let src = Array.init n (fun _ -> Random.float 1.) in
let dst = Array.create n 0. in
let t = Sys.time() in
for i=1 to 5000 do
smooth dst src;
smooth src dst;
done;
Printf.printf "Time smooth(): %fs\n" (Sys.time() -. t)

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u

Report this thread to moderator Post Follow-up to this message
Old Post
Jon Harrop
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
jason.cipriani@gmail.com wrote:
> If you are going for blinding speed be sure to use proper optimization
> flags; otherwise you are only calculating benchmarks for poorly
> optimized code, which, like most of your benchmarks, is meaningless:
>
> $ g++ -O0 smooth.cpp
> 9884.867 ms
>
> $ g++ -O3 smooth.cpp
> 8791.123 ms
>
> $ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math smooth.cpp
> 1207.944 ms

GCC's -ffast-math option breaks semantics, so it is not a valid
optimization. I would not use it in a comparison against Java unless you
know that accuracy is not required for the given task.

> $ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math -funroll-
> loops smooth.cpp
> 1084.385 ms
>
> This is your code slightly modified to use QueryPerformanceCounter for
> timing (using Windows). Most of the speed up is a result of -ffast-
> math.
>
> Based on the quality and rigor of the other tests I've seen you do,
> I'd guess that you missed a few optimization flags on the Java side as
> well.

I have altered all implementations of this benchmark to perform progressive
blurring from one array to the other and back rather than overwriting the
destination array repeatedly. This slightly improved performance in all
cases.

On my Athlon 64 X2 4400+ (2.2Ghz) running AMD64 Debian Linux, I get
best-of-three results:

g++:   4.080  g++ -O3 bench.cpp -o bench; time ./bench
g++2:  4.080  g++ -O3 bench2.cpp -o bench2; time ./bench2
ocaml: 4.048  ocamlopt bench.ml -o bench; time ./bench
java:  4.016  javac Arrays.java; time java Arrays

On the same hardware running 32-bit Windows XP Pro, I get:

F#:    3.970  fsc -O3

This shows an insignificant difference in performance which, if nothing
else, is interesting because the second C++ program uses bounds checking. I
was also surprised to see OCaml do so well on this benchmark because it
makes no attempt to hoist bounds checks. F# on .NET typically beats all
open source compilers on Linux on this kind of benchmark (float array
intensive). C# presumably would too.

The bench2 program is the following, more idiomatic, C++ version that uses
bounds checking:

#include <vector>
#include <iostream>
#include <cstdlib>
#include <ctime>

void smooth (std::vector<double> &dest, std::vector<double> &src)
{
for (int i = 1; i < src.size() - 1; i++)
dest.at(i - 1) = (src.at(i - 1) + src.at(i) + src.at(i + 1)) / 3 ;
}

void fill (std::vector<double> &src)
{
srand((unsigned)std::time(0));

for (int i = 0; i < src.size(); ++ i )
src.at(i) = rand();
}

int main()
{
const int len = 50000;

std::vector<double> src_array(len);
std::vector<double> dest_array(len);

fill(src_array);

clock_t start=clock();

for (int i = 0; i < 5000; i++)
{
smooth(dest_array, src_array);
smooth(src_array, dest_array);
}

clock_t endt=clock();

std::cout <<"Time smooth(): " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

return 0;
}

Here is my OCaml equivalent:

let smooth dst src =
let n = Array.length dst in
for i=1 to n-2 do
dst.(i-1) <- (src.(i-1) +. src.(i) +. src.(i+1)) /. 3.
done

let () =
let n = 50000 in
let src = Array.init n (fun _ -> Random.float 1.) in
let dst = Array.create n 0. in
let t = Sys.time() in
for i=1 to 5000 do
smooth dst src;
smooth src dst;
done;
Printf.printf "Time smooth(): %fs\n" (Sys.time() -. t)

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u

Report this thread to moderator Post Follow-up to this message
Old Post
Jon Harrop
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 6:54=A0pm, Razii <DONTwhatever...@hotmail.com> wrote:
> On Sun, 23 Mar 2008 10:29:44 -0700 (PDT), James Kanze
>
> <james.ka...@gmail.com> wrote: 
> 
>
> Post an example that I can test where aliasing is a problem. In the
> example that I posted, c++ was faster.

The smooth function must not be inlined. You can control this inlining
property in C++ but not in java afaik. Then, you can't directly
compare the performances of both language this way if you disable
inlining just for one. But I can show you the difference between 2 C++
programs, one caring about pointer aliasing and one applying no
aliasing optimization, by following those steps :

1. we will use a simpler version of the function smooth ( of course
the name "smooth" is now meaningless ).
2. we will make it not inlined in visual C++ to disable static
analysis of pointer aliasing
3. we won't pass the length as a parameter to the smooth, but use it
as a global variable. The compiler will do loop unrolling and caching
array access will be easier.
4. We must make a dummy call to the smooth function with dest_array as
an argument for both the first and the second parameter of foo. Global
analysis then won't detect that all calls to foo have no aliasing
issue.

Here is the modified version of your program :

#include <iostream>
#include <ctime>

//#define NO_ALIASING_OPTIMIZATION

const int len =3D 50000;

__declspec(noinline)
#ifdef NO_ALIASING_OPTIMIZATION
void smooth (int*   dest,  int *  src )
#else
void smooth (int*  __restrict dest,  int * __restrict src )
#endif
{
for ( int i =3D 0 ; i < len - 2 ; i++ )
dest[ i ] =3D src[ i ] + src[ i + 1 ] + src[ i + 2 ];
}

void fill (int* src)
{
for (int i =3D 0 ; i < len ; ++ i )
src[i] =3D i;
}


int main()
{

int src_array [len] =3D {0} ;
int dest_array [len] =3D {0};

fill(src_array);

smooth (dest_array, dest_array); // dummy call

clock_t start=3Dclock();

for (int i =3D 0; i < 10000; i++)
smooth (dest_array, src_array);


clock_t endt=3Dclock();

std::cout <<"Time smooth(): " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

// doesn't work without the following cout on vc++
std::cout << dest_array [0] ;

return 0;
}


Uncomment the line :
//#define NO_ALIASING_OPTIMIZATION
to see the performance gain with the no aliasing optimization.


Alexandre Courpron.


Report this thread to moderator Post Follow-up to this message
Old Post
courpron@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 9:41=A0pm, courp...@gmail.com wrote:
> On Mar 23, 6:54=A0pm, Razii <DONTwhatever...@hotmail.com> wrote:
> 
> 
> 
> 
>
> The smooth function must not be inlined. You can control this inlining
> property in C++ but not in java afaik. Then, you can't directly
> compare the performances of both language this way if you disable
> inlining just for one. But I can show you the difference between 2 C++
> programs, one caring about pointer aliasing and one applying no
> aliasing optimization, by following those steps :
>
> 1. we will use a simpler version of the function smooth ( of course
> the name "smooth" is now meaningless ).
> 2. we will make it not inlined in visual C++ to disable static
> analysis of pointer aliasing
> 3. we won't pass the length as a parameter to the smooth, but use it
> as a global variable. The compiler will do loop unrolling and caching
> array access will be easier.
> 4. We must make a dummy call to the smooth function with dest_array as
> an argument for both the first and the second parameter of foo. Global
> analysis then won't detect that all calls to foo have no aliasing
> issue.
>
> Here is the modified version of your program :
>
> #include <iostream>
> #include <ctime>
>
> //#define NO_ALIASING_OPTIMIZATION
>
> const int len =3D 50000;
>
> __declspec(noinline)
> #ifdef NO_ALIASING_OPTIMIZATION
> void smooth (int* =A0 dest, =A0int * =A0src )
> #else
> void smooth (int* =A0__restrict dest, =A0int * __restrict src )
> #endif
> {
> =A0 =A0 for ( int i =3D 0 ; i < len - 2 ; i++ )
> =A0 =A0 =A0 =A0dest[ i ] =3D src[ i ] + src[ i + 1 ] + src[ i + 2 ];
>
> }
>
> void fill (int* src)
> {
> =A0 =A0 for (int i =3D 0 ; i < len ; ++ i )
> =A0 =A0 =A0 =A0 src[i] =3D i;
>
> }
>
> int main()
> {
>
> int src_array [len] =3D {0} ;
> int dest_array [len] =3D {0};
>
> fill(src_array);
>
> smooth (dest_array, dest_array); // dummy call
>
> clock_t start=3Dclock();
>
> for (int i =3D 0; i < 10000; i++)
> smooth (dest_array, src_array);
>
> clock_t endt=3Dclock();
>
> =A0 =A0 =A0 =A0 std::cout <<"Time smooth(): " <<
> =A0 =A0 =A0 =A0 =A0 double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";
>
> // doesn't work without the following cout on vc++
> std::cout << dest_array [0] ;
>
> return 0;
>
> }
>
> Uncomment the line :
> //#define NO_ALIASING_OPTIMIZATION
> to see the performance gain with the no aliasing optimization.
>
> Alexandre Courpron.

There is one mistake in the above code, in the smooth function:

instead of :
#ifdef NO_ALIASING_OPTIMIZATION
it should be :
#ifndef NO_ALIASING_OPTIMIZATION

the whole smooth function then should be :

__declspec(noinline)
#ifndef NO_ALIASING_OPTIMIZATION
void smooth (int*   dest,  int *  src )
#else
void smooth (int*  __restrict dest,  int * __restrict src )
#endif
{
for ( int i =3D 0 ; i < len - 2 ; i++ )
dest[ i ] =3D src[ i ] + src[ i + 1 ] + src[ i + 2 ];
}


Alexandre Courpron.

Report this thread to moderator Post Follow-up to this message
Old Post
courpron@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 4:41 pm, courp...@gmail.com wrote:
> The smooth function must not be inlined. You can control this inlining
> property in C++ but not in java afaik. Then, you can't directly
> compare the performances of both language this way if you disable
> inlining just for one. But I can show you the difference between 2 C++
> programs, one caring about pointer aliasing and one applying no
> aliasingoptimization, by following those steps :

What version of GCC are you using? I had already tried this earlier on
the original post; the version with and without __restrict not only
produced identical times, but generated identical code as well
(diff'ing objdump disassembly). I am using MinGW GCC 3.4.5; it's could
be the case that this version ignores __restrict.

I also had tested a version that inlined smooth(), and used the same
array for input and output -- using the first half for output, with
index offsets inside the loop, something to the effect of:

double buf[len * 2];
int i, r;
copy(src, src + len, buf + len);

for (r = 0; r < 10000; ++ r)
for (i = 1 ; i < len - 1 ; i++ )
buf[ i - 1 ] = (buf[ len + i - 1 ] +
buf[ len + i ] + buf[ len+ i + 1 ]) / 3 ;

It gave identical timings as well.

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
In article < MIednRYljpBPQ3vanZ2dnUVZ_rLinZ2d@comcast
.com>,
andreytarasevich@hotmail.com says...

[ ... ]

> I don't really see much sense in comparing Java performance with C++
> performance. What might be more interesting is the effect of the 'restrict
'
> specifier supported by C99 compilers (and many C89/90 and C++ compielers a
s an
> extension), which is intended to assist compiler in performing exactly thi
s kind
> of optimizations.

Of, if you're really interested in C++, you could throw in a comparison
use a valarray, which was designed to give the same sort of assurance.
Unfortunately, I don't know of anybody who seems to have gone to much
(if any) trouble to optimize valarray at all -- rather the contrary, it
was an idea that even its own creator admits was in the wrong place at
the wrong time, so it's ignored almost to death, so to speak.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Report this thread to moderator Post Follow-up to this message
Old Post
Jerry Coffin
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 10:28=A0pm, "jason.cipri...@gmail.com"
<jason.cipri...@gmail.com> wrote:
> On Mar 23, 4:41 pm, courp...@gmail.com wrote:
> 
>
> What version of GCC are you using? I had already tried this earlier on
> the original post; the version with and without __restrict not only
> produced identical times, but generated identical code as well
> (diff'ing objdump disassembly). I am using MinGW GCC 3.4.5; it's could
> be the case that this version ignores __restrict.
>
> I also had tested a version that inlined smooth(), and used the same
> array for input and output -- using the first half for output, with
> index offsets inside the loop, something to the effect of:
>
> =A0 double buf[len * 2];
> =A0 int i, r;
> =A0 copy(src, src + len, buf + len);
>
> =A0 for (r =3D 0; r < 10000; ++ r)
> =A0 =A0 for (i =3D 1 ; i < len - 1 ; i++ )
> =A0 =A0 =A0 buf[ i - 1 ] =3D (buf[ len + i - 1 ] +
> =A0 =A0 =A0 =A0 =A0 =A0 =A0buf[ len + i ] + buf[ len+ i + 1 ]) / 3 ;
>
> It gave identical timings as well.
>
> Jason

You must make sure that :
- the smooth function is not inlined
- the compiler applies loop unrolling, to make the caching process
easier.

I quickly test the modified program I gave earlier, with GCC 4.3 .
Apparently, GCC has difficulty to do partial loop unrolling, I then
reduced the number of iterations in the smooth function. I might
investigate this issue tomorrow. Anyway, here is the program for GCC
that should give different results depending on the use of the
"restrict" keyword :

#include <iostream>
#include <ctime>

//#define NO_ALIASING_OPTIMIZATION

const int len =3D 20;

__attribute__((noinline))
#ifndef NO_ALIASING_OPTIMIZATION
void smooth (int*   dest,  int *  src )
#else
void smooth (int*  __restrict dest,  int * __restrict src )
#endif
{
for ( int i =3D 0 ; i < 17 ; ++i )
dest[ i ] =3D src[ i ] + src[ i + 1 ] + src[ i + 2 ];
}

void fill (int* src)
{
for (int i =3D 0 ; i < len ; ++ i )
src[i] =3D i;
}


int main()
{

int src_array [len] =3D {0} ;
int dest_array [len] =3D {0};

fill(src_array);

smooth (dest_array, dest_array); // dummy call

clock_t start=3Dclock();

for (int i =3D 0; i < 100000000; i++)
smooth (dest_array, src_array);


clock_t endt=3Dclock();

std::cout <<"Time smooth(): " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

// doesn't work without the following cout on vc++
std::cout << dest_array [0] ;

return 0;
}

Uncommenting the line
//#define NO_ALIASING_OPTIMIZATION
should make the compiler apply no aliasing optimizations.


Alexandre Courpron.



Report this thread to moderator Post Follow-up to this message
Old Post
courpron@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
Razii wrote:
> On Sun, 23 Mar 2008 15:40:44 +0000, Jon Harrop <usenet@jdh30.plus.com>
> wrote:
> 
>
> Make sure you are using server VM to run it ...

I am.

> that is, java -server Arrays

On my machine it is the default.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?u

Report this thread to moderator Post Follow-up to this message
Old Post
Jon Harrop
03-24-08 12:13 AM


Sponsored Links




Last Thread Next Thread Next
Pages (12): « 1 2 [3] 4 5 6 7 8 » ... Last »
Search this forum -> 
Post New Thread

C++ archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 09:44 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.