Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: Array optimizing problem in C++?
On Mar 23, 11:12 am, Razii <DONTwhatever...@hotmail.com> wrote:
> I said
> nothing about flags in the last post.

As was pointed out in your other thread, these are important details
that you should not leave out; especially in this particular benchmark
(which is not I/O bound like the last one, and is heavily affected by
compiler optimizations).

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
<jason.cipriani@gmail.com> wrote in message
news:0d80b694-8609-4f80-9aeb-
e55bcae9a8b0@s8g2000prg.googlegroups.com...
> (I am sorry my point
> wasn't clear, I had meant to show that the compiler can generate very
> fast code for you, in particular -ffast-math [which does not target
> specific hardware features] on GCC moreso than the the architecture-
> specific options).

Here are some key missing timings with GCC and VS, Razii:

$ g++ -O2 smooth.cpp
8677.623 ms

$ g++ -O2 -ffast-math smooth.cpp
1077.232 ms

$ g++ -O2 -ffast-math -funroll-loops smooth.cpp
919.622 ms

With CL 14 (VS2005):
7275.611 ms

No platform-specific optimizations were used. Note the order of
magnitude speed up using -ffast-math and -funroll-loops with GCC,
which generates code that can still be run on the least-common-
denominator of Intel-compatible platforms.

Also note that -O2 provides slightly better performance than -O3. Just
goes to show why I shouldn't be using -O3, I guess :-)

Jason


Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 1:18 pm, "jason.cipri...@gmail.com"
<jason.cipri...@gmail.com> wrote:
> Also note that -O2 provides slightly better performance than -O3. Just
> goes to show why I shouldn't be using -O3, I guess :-)

I take that back, I got it down to 903ms with -O3 over -O2. Perhaps my
machine was just in a bad mood yesterday.

I guess I shouldn't post in this thread for a while, I'm getting a
little uncomfortable with my post ratio here. :-(

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 5:00=A0pm, Razii <DONTwhatever...@hotmail.com> wrote:
> On Sun, 23 Mar 2008 15:45:21 GMT, red floyd <no.s...@here.dude> wrote: 
 
>
> I used the proper flag /O2 in vc++. Also, when you are deploying a
> commercial software, you will have to use flags that target the
> least-common-denominator processor. That's a divantage of C++ vs
> JIT language. The JIT compiler knows what processor it is running on,
> and can generate code specifically for that processor. Thus, I won't
> use anything other than /O2 for c++... because, as I said, when you
> are deploying a commercial software, you will have to use flags that
> target the least-common-denominator processor anyway.

You are using Visual C++. You should use at least the following flags,
in order to enable whole program optimization and link-time code
generation :

cl /O2 /GL prog.cpp /link /ltcg

You should make again all your benchmarks with at least those options
enabled. By the way, as I said earlier, you should use ints instead of
doubles. It will save you from a lot of troubles.

Alexandre Courpron.


Report this thread to moderator Post Follow-up to this message
Old Post
courpron@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 1:18 pm, "jason.cipri...@gmail.com"
<jason.cipri...@gmail.com> wrote:
> With CL 14 (VS2005):
> 7275.611 ms

With /O2 and LTCG enabled!

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On 23 mar, 09:58, "jason.cipri...@gmail.com"
<jason.cipri...@gmail.com> wrote:
> Also, wrt James' original post:
 
/ 3 ; 
 

> I am not sure what you would expect in either language. I
> believe that James had been expecting it to cache [i+1] in an
> fpu register, and use that instead of accessing the value the
> next time through.

Exactly.  It's a very common optimization.

> As it stands right now, VS at least (I didn't check GCC)
> generates assembler instructions that are more equivalent to
> this:

>   fpu_register =3D src[i - 1]
>   fpu_register +=3D src[i]
>   fpu_register +=3D src[i + 1]
>   fpu_register /=3D 3
>   dest[i - 1] =3D fpu_register

> Using fld, fadd, fdiv, and fstp on Intel machines. It never
> loads src[i + 1] anyways. I have not tested this or done any
> research, but I suspect this is still a bit faster than:

>   fpu_register =3D src[i - 1]
>   fpu_register +=3D other_fpu_register
>   other_fpu_register =3D src[i + 1]
>   fpu_register +=3D other_fpu_register
>   fpu_register /=3D 3
>   dest[i - 1] =3D fpu_register

The fastest solution would pre-charge the first two values, and
only read one new value each time through the loop.  (Don't
forget that memory bandwidth will be the limiting factor for
this type of loop.)  The Intel's stack architecture may make
optimizing this somewhat more difficult, but it should still be
possible.  You'll end up with more instructions, but less memory
accesses and better run time.

But of course, you can't do this in C++, because dest[ i -1 ]
might modify one of the values you're holding in register.  (Of
course, some compilers do generate two versions of the loop,
with code which checks for aliasing first, and uses one or the
other, depending on whether aliasing is present or not.  But
this isn't the usual case.)

--
James Kanze (GABI Software)             email:james.kanze@gmail.com
Conseils en informatique orient=E9e objet/
Beratung in objektorientierter Datenverarbeitung
9 place S=E9mard, 78210 St.-Cyr-l'=C9cole, France, +33 (0)1 30 23 00 34

Report this thread to moderator Post Follow-up to this message
Old Post
James Kanze
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Sun, 23 Mar 2008 10:29:44 -0700 (PDT), James Kanze
<james.kanze@gmail.com> wrote:
 
>
>Exactly.  It's a very common optimization.

Post an example that I can test where aliasing is a problem. In the
example that I posted, c++ was faster.



Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Sun, 23 Mar 2008 10:23:07 -0700 (PDT), courpron@gmail.com wrote:

>You should make again all your benchmarks with at least those options
>enabled

What all other bench marks? :) I only posted one IO example.  This
post was not a benchmark but a question to Kanze to prove by posting
an example (that I can test) where aliasing in c++ makes optimizing
arrays hard (or impossible). The example that he gave (which I used),

for ( size_t i = 1 ; i < len - 1 ; ++ i ) {
dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;
}

c++ was in fact faster. Obviously aliasing was no issue in this case,
as he claimed.


Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 11:40 am, Jon Harrop <use...@jdh30.plus.com> wrote: 
>
> GCC's -ffast-math option breaks semantics, so it is not a valid optimization.[/col
or]

Only sometimes; and it's a valid optimization. Specifically, in this
case, the results are identical. Mostly, in my experience, you start
to lose precision with -ffast-math when you start doing things beyond
simple arithmetic, such as sqrt() and cos(), or when you get into the
realm of overflows and NaNs.

In case anybody is curious the Intel compiler yields similar results
to VS, and to GCC with SSE3 enabled (but no -ffast-math), which is the
expected results:

icl /Ox /QxP /Qipo /Qunroll-aggressive smooth.cpp

Was about 7400 ms for me. With:

icl /Ox /QxP /Qipo /Qprec-div- /Qunroll-aggressive smooth.cpp

Dropping it down to 1100 ms (ICC's /Qprec-div- is similar in spirit to
GCC's -ffast-math).

Following are 3 source files and a Makefile, I used MinGW GCC 3.4.5;
you will want to implement your own tick()/tock() functions; the
windows.h #include is only for those. The output, for me, is:

$ ./smooth.exe
no -ffast-math: 8796.27
-ffast-math: 923.052
1e-014
delta: 0
they are precisely equal.


==== Makefile ====

CFLAGS = -O2 -funroll-loops

.PHONY: clean

smooth.exe: smooth_main.cpp smooth_nofm.cpp smooth_fm.o
g++ $(CFLAGS) smooth_main.cpp smooth_nofm.cpp smooth_fm.o -o $@

smooth_fm.o: smooth_fm.cpp
g++ $(CFLAGS) -ffast-math -c $<

clean:
rm -f smooth_fm.o smooth.exe


==== smooth_fm.cpp ====

void smooth_fm (double *dest, double const *src, int len) {
for (int i = 1 ; i < len - 1 ; i++ )
dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;
}


==== smooth_nofm.cpp ====

void smooth_nofm (double *dest, double const *src, int len) {
for (int i = 1 ; i < len - 1 ; i++ )
dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;
}


==== smooth_main.cpp ====

#include <algorithm>
#include <ctime>
#include <iostream>
#include <windows.h>

using namespace std;

LARGE_INTEGER s_tick;

void smooth_nofm (double *, double const *, int);
void smooth_fm (double *, double const *, int);

double tick (void) {
QueryPerformanceCounter(&s_tick);
}

double tock (const string &msg) {
LARGE_INTEGER now, freq;
QueryPerformanceCounter(&now);
QueryPerformanceFrequency(&freq);
cout << msg << ": " <<
((double)(now.QuadPart - s_tick.QuadPart) /
(double)(freq.QuadPart / 1000LL)) << endl;
}

void fill (double *src, int len ) {
srand(time(NULL));
for (int i = 0; i < len; ++ i)
src[i] = rand();
}

int main () {

const int len = 50000;
double src_array1[len];
double src_array2[len];
double dest_array[len];
double fm, nofm;

fill(src_array1, len);
copy(src_array1, src_array1 + len, src_array2);

tick();
for (int i = 0; i < 10000; i++)
smooth_nofm(dest_array, src_array1, len);
tock("no -ffast-math");
nofm = dest_array[0];

tick();
for (int i = 0; i < 10000; i++)
smooth_fm(dest_array, src_array2, len);
tock("-ffast-math");
fm = dest_array[0];

cout << 0.00000000000001 << endl;
cout << "delta: " << (fm - nofm) << endl;

if (fm == nofm)
cout << "they are precisely equal." << endl;

return 0;

}

==== END ====

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Re: Array optimizing problem in C++?
On Mar 23, 2:55 pm, Razii <DONTwhatever...@hotmail.com> wrote:
> [snip]

Razii, you are benchmarking compiler optimization techniques, not
language differences. Again, just as with your I/O hardware
benchmarks, your tests have two many variables in them to be used as
simply a comparison between C++ and Java.

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:13 AM


Sponsored Links




Last Thread Next Thread Next
Pages (12): « 1 [2] 3 4 5 6 7 » ... Last »
Search this forum -> 
Post New Thread

C++ archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:49 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.