Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Array optimizing problem in C++?
From an old post by James Kanze

On Apr 9 2003, 5:42 pm, ka...@gabi-soft.de (James Kanze) wrote:

>When I pass an "array" to a function in C/C++, I actually pass
>a pointer to the first element.  And the compiler, when it compiles the
>function, only sees pointers -- it has no way of determining any
>relationship between them. Consider, for example, a smoothing function
>(in C or earlier C++):
>
>    void
>    smooth( double* dest,
>            double const* src,
>            size_t len )
>    {
>        for ( size_t i = 1 ; i < len - 1 ; ++ i ) {
>            dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;
>        }
>    }
>
>
>The compiler cannot arrange to use the src[ i + 1 ] value from the
>preceding pass through the loop as the src[ i ] value in the current
>pass, since the write to dest[ i - 1 ] might have changed it.  In Java,
>it can, since two arrays are either identical or disjoint.
>
>
>This sort of code shows up frequently.  In benchmarks from Java
>suppliers comparing Java to C++, of course:-).  But also in any number
>of applications dealing with physical measures: meteorology, geophysical
>research (oil prospection), etc.

Out of curiosity, I tried to test if the above is true. It didn't make
any difference. In fact, C++ was a bit faster (not by much, just 6%).
Probably due to array bound check in Java, if there is in indeed an
issue with C++ arrays, overall there is no difference.



==c++==
#include <iostream>
#include <cstdlib>
#include <ctime>

void fill (double* src, int len);
void smooth (double* dest,
double const* src,
int len );

int main()
{
const int len = 50000;

double src_array [len] = {0};
double dest_array [len] = {0};


fill(src_array, len);

clock_t start=clock();

for (int i = 0; i < 10000; i++)
smooth (dest_array, src_array, len);


clock_t endt=clock();

std::cout <<"Time smooth(): " <<
double(endt-start)/CLOCKS_PER_SEC * 1000 << " ms\n";

// doesn't work without the following cout on vc++
std::cout << dest_array [0] ;

return 0;
}


void smooth (double* dest,  double const* src, int len )
{
for (int i = 1 ; i < len - 1 ; i++ ) {
dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;

}
}

void fill (double* src, int len)
{
srand((unsigned)std::time(0));

for (int i = 0 ; i < len ; ++ i ) {
src[i] = rand();
}
}




==== java ========

import java.util.*;

public class Arrays{
public static void main (String[] arg)
{
final int  LEN  = 50000;
double[] src_array  = new double [LEN];
fill(src_array, LEN);
double[] dest_array = new double [LEN];

long start = System.currentTimeMillis();

for (int i = 0; i < 10000; i++)
smooth(dest_array, src_array, LEN);

long end = System.currentTimeMillis();

System.out.println("Time smooth(): " + (end - start) + " ms");


}


static void smooth (double[] dest, double[] src, int len )
{
for (int i = 1 ; i < len - 1 ; i++ ) {
dest[ i - 1 ] = (src[ i - 1 ] + src[ i ] + src[ i + 1 ]) / 3 ;
}
}


static void fill (double[] src, int len)
{
for (int i = 0 ; i < len; i++)
src[i] = Math.random();
}

}


Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-23-08 09:41 AM


Re: Array optimizing problem in C++?
On Mar 23, 4:08 am, Razii <DONTwhatever...@hotmail.com> wrote:
> From an old post by James Kanze
>
> On Apr 9 2003, 5:42 pm, ka...@gabi-soft.de (James Kanze) wrote:
>
>
> 
> 
> 
> 
>
> Out of curiosity, I tried to test if the above is true. It didn't make
> any difference. In fact, C++ was a bit faster (not by much, just 6%).
> Probably due to array bound check in Java, if there is in indeed an
> issue with C++ arrays, overall there is no difference.

If you are going for blinding speed be sure to use proper optimization
flags; otherwise you are only calculating benchmarks for poorly
optimized code, which, like most of your benchmarks, is meaningless:

$ g++ -O0 smooth.cpp
9884.867 ms

$ g++ -O3 smooth.cpp
8791.123 ms

$ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math smooth.cpp
1207.944 ms

$ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math -funroll-
loops smooth.cpp
1084.385 ms

This is your code slightly modified to use QueryPerformanceCounter for
timing (using Windows). Most of the speed up is a result of -ffast-
math.

Based on the quality and rigor of the other tests I've seen you do,
I'd guess that you missed a few optimization flags on the Java side as
well.

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-23-08 09:41 AM


Re: Array optimizing problem in C++?
On Mar 23, 4:35 am, "jason.cipri...@gmail.com"
<jason.cipri...@gmail.com> wrote:
> If you are going for blinding speed be sure to use proper optimization
> flags; otherwise you are only calculating benchmarks for poorly
> optimized code, which, like most of your benchmarks, is meaningless:
>
> $ g++ -O0 smooth.cpp
> 9884.867 ms
>
> $ g++ -O3 smooth.cpp
> 8791.123 ms
>
> $ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math smooth.cpp
> 1207.944 ms
>
> $ g++ -O3 -mfpmath=sse -msse3 -march=prescott -ffast-math -funroll-
> loops smooth.cpp
> 1084.385 ms

I left one out:

$ g++ -03 -mfpmath=sse -msse3 smooth.cpp
7379.883 ms

Compare to -O3 without having GCC generate SSE instructions for you.

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-23-08 09:41 AM


Re: Array optimizing problem in C++?
Also, wrt James' original post:
 
> 

I am not sure what you would expect in either language. I believe that
James had been expecting it to cache [i+1] in an fpu register, and use
that instead of accessing the value the next time through.

As it stands right now, VS at least (I didn't check GCC) generates
assembler instructions that are more equivalent to this:

fpu_register = src[i - 1]
fpu_register += src[i]
fpu_register += src[i + 1]
fpu_register /= 3
dest[i - 1] = fpu_register

Using fld, fadd, fdiv, and fstp on Intel machines. It never loads
src[i + 1] anyways. I have not tested this or done any research, but I
suspect this is still a bit faster than:

fpu_register = src[i - 1]
fpu_register += other_fpu_register
other_fpu_register = src[i + 1]
fpu_register += other_fpu_register
fpu_register /= 3
dest[i - 1] = fpu_register

Which I believe is what the original expected "optimization" was.

I have removed comp.lang.java.programmer from this one, since it is
unrelated to Java.

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-23-08 09:41 AM


Re: Array optimizing problem in C++?
On Sun, 23 Mar 2008 01:35:06 -0700 (PDT), "jason.cipriani@gmail.com"
<jason.cipriani@gmail.com> wrote:

>If you are going for blinding speed be sure to use proper optimization
>flags; otherwise you are only calculating benchmarks for poorly
>optimized code, which, like most of your benchmarks, is meaningless:

Where did you get that I am not using proper optimizing flag? I said
nothing about flags in the last post.

>$ g++ -O0 smooth.cpp
>9884.867 ms

I am using vc++

>I'd guess that you missed a few optimization flags on the Java side as
>well.

There are no optimization flags for java compiler or vm. Post proof
and links.

Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-24-08 12:12 AM


Re: Array optimizing problem in C++?
Razii wrote:
> On Sun, 23 Mar 2008 01:35:06 -0700 (PDT), "jason.cipriani@gmail.com"
> <jason.cipriani@gmail.com> wrote:
> 
>
> Where did you get that I am not using proper optimizing flag? I said
> nothing about flags in the last post.
> 
>
> I am using vc++
>

Well, then, you can't make sweeping statements about C++ vs. Java.  You
can only make statements about *a particular implementation* of C++ vs.
Java.

Report this thread to moderator Post Follow-up to this message
Old Post
red floyd
03-24-08 12:12 AM


Re: Array optimizing problem in C++?
red floyd wrote:
> Well, then, you can't make sweeping statements about C++ vs. Java.  You
> can only make statements about *a particular implementation* of C++ vs.
> Java.

This is one of the procedural flaws common to all benchmarks.  Generally
speaking, one can only hope for full disclosure of the conditions for a
benchmark, and that those conditions approximate some meaningful aspect of
one's actual business need.

This does not mean that benchmarks are useless, only that one cannot decry
them for limitations inherent to the benchmark process itself.

--
Lew

Report this thread to moderator Post Follow-up to this message
Old Post
Lew
03-24-08 12:12 AM


Re: Array optimizing problem in C++?
On Sun, 23 Mar 2008 15:45:21 GMT, red floyd <no.spam@here.dude> wrote:

>Well, then, you can't make sweeping statements about C++ vs. Java.  You
>can only make statements about *a particular implementation* of C++ vs.
>Java.

I used the proper flag /O2 in vc++. Also, when you are deploying a
commercial software, you will have to use flags that target the
least-common-denominator processor. That's a divantage of C++ vs
JIT language. The JIT compiler knows what processor it is running on,
and can generate code specifically for that processor. Thus, I won't
use anything other than /O2 for c++... because, as I said, when you
are deploying a commercial software, you will have to use flags that
target the least-common-denominator processor anyway.



Report this thread to moderator Post Follow-up to this message
Old Post
Razii
03-24-08 12:12 AM


Re: Array optimizing problem in C++?
On Mar 23, 9:08=A0am, Razii <DONTwhatever...@hotmail.com> wrote:
> From an old post by James Kanze
>
> On Apr 9 2003, 5:42 pm, ka...@gabi-soft.de (James Kanze) wrote:
>
>
>
>
> 
 
> 
i + 1 ]) / 3 ; 
> 
 
> 
>
> Out of curiosity, I tried to test if the above is true. It didn't make
> any difference. In fact, C++ was a bit faster (not by much, just 6%).
> Probably due to array bound check in Java, if there is in indeed an
> issue with C++ arrays, overall there is no difference.

The issue C++ has with arrays is known as pointer aliasing. C has the
keyword "restrict" to deal with this problem but not C++. The
"restrict" keyword should be added in the next C++ standard. Most C++
compilers currently propose their own "restrict" keywords though.
Anyway, C++ has std::valarray. Arrays constructed with std::valarray
are free from aliasing. So, in the absolute, the Java language has no
performance edge over the current version of the C++ language in this
domain. However, even if theorically, std:valarray is free from
pointer aliasing, in practice, this isn't always the case, depending
on the compiler.

> if there is in indeed an issue with C++ arrays, overall there is no differ=[/color
]
ence.

With the right optimization flags, the smooth function would probably
get inlined, and static analysis of pointer aliasing should happen,
allowing the C++ compiler to perform no aliasing optimizations.

> Probably due to array bound check in Java

A good java compiler should have applied bounds checking elimination
in this case especially since the smooth function should be inlined.

As a side note, your program should handle ints instead of doubles, to
focus more on the performance of array access.

Alexandre Courpron.

Report this thread to moderator Post Follow-up to this message
Old Post
courpron@gmail.com
03-24-08 12:12 AM


Re: Array optimizing problem in C++?
On Mar 23, 12:00 pm, Razii <DONTwhatever...@hotmail.com> wrote:
> On Sun, 23 Mar 2008 15:45:21 GMT, red floyd <no.s...@here.dude> wrote: 
>
> I used the proper flag /O2 in vc++. Also, when you are deploying a
> commercial software, you will have to use flags that target the
> least-common-denominator processor. That's a divantage of C++ vs
> JIT language.

There are three things that are not correct about this:

1) The -ffast-math flag to GCC does not use any special processor
instructions. It simply removes certain assumptions, which is safe
when you know they aren't true. This is the flag with the major
performance increase. Do not confuse "targeting the least-common-
denominator processor" with "not using an ideal compiler for the
application". Whether you compile your commercial code with GCC or VC+
+ is irrelevant to what machine you are targetting. If your goal is to
target the least-common-denominator, then you would leave off, for
example, the SSE optimizations. However, in my example, it was -ffast-
math that was responsible for the largest speedup. Target platform is
not relevant.

2) It's more of a consequence of only distributing the least-common-
denominator than of using C++. That's a choice you can make. It would
be completely reasonable, for example, for a vendor of high-
performance research applications to distribute SSE-optimized builds.
E.g. Folding@Home (non-commercial, http://folding.stanford.edu)
maintains a GPU-optimized version of their software as well as a
"normal" one; and their non-GPU client checks for various processor
extensions at run-time (such as SSE) and uses optimized code
accordingly. This is the same with 32- and 64-bit applications. This
is also common in OpenGL application development; where applications
can look for specific OpenGL features and take advantage of them,
rather than always only targeting the least-common-denominator of
graphics hardware. The same thing applies to Java wrt OpenGL.
Depending on how you *want* to look at it, an Intel machine with SSE
support is just as different from an Intel machine without SSE as it
is from a PowerPC with AltiVec (same with multicore machines). It just
so happens that you can easily target the least-common-denominator for
those Intel machines, but nothing stops you from releasing platform-
optimized builds. So it is not a divantage of C++ vs. anything; it
is a divantage of sticking to the last-common-denominator.

3) A JIT language has an extra layer of abstraction before the
hardware. This makes it difficult to perform these particular kinds of
optimizations, anyways. It's apples and oranges, the Java JIT compiler
is a completely different kind of beast than a native code compiler,
the optimizations available for a JIT compiler to make and the
approaches it can take to making them are too different from those of
a native compiler to make a valid comparison. They're just different
tools.

What you are talking about is a difference between targeting the least-
common-denominator of platforms, and targeting specific platforms --
which is an issue present no matter what language you are using. You
are not talking about a difference between any two specific languages.
Furthermore, what *I* was talking about is a difference between
different compilers, targeting the same platform (I am sorry my point
wasn't clear, I had meant to show that the compiler can generate very
fast code for you, in particular -ffast-math [which does not target
specific hardware features] on GCC moreso than the the architecture-
specific options).

> The JIT compiler knows what processor it is running on,
> and can generate code specifically for that processor.

True, and a valid point. However, depending on the nature of what you
are compiling, you easily could lose a lot of information about the
original intention of the code in the original optimizing pass that
limits how effective processor-specific optimizations made by the JIT
compiler can be.

For example (this is a very specific example but please extend it to
general cases): let's say you have some Java code that, I don't know,
multiplies the elements of two large arrays of floating-point values
together in a loop. You compile it to bytecode, and in the process
javac unrolls that loop to optimize a bit. Then you run it on a
platform that, say, provides a specific instruction for multiplying
large arrays of floating-point values together very quickly. If the
original compiler had targeted this platform to begin with, it could
have seen your access pattern and used that optimized instruction.
Instead it unrolled the loop, a least-common-denominator optimization.
Now the JIT compiler sees the unrolled loop, and because it does not
see the sequential access pattern that was very apparent in the
original code, it fails to analyze the unrolled loop and determine
that it should be "re-rolled" and use the optimized instruction
instead. Now this particular example might not be that great, I know,
because you could conceivably construct a JIT compiler that can
recognize and re-roll unrolled loops; but the general point is what I
said: you can potentially lose a lot of information by compiling code
that you can not recover, and this information could be critical to
performing the best optimizations off the bat.

Consequently, the JIT compiler must turn to other techniques for
optimization, such as caching compiled code for faster startup times,
etc. Again, it's a whole different beast.

> Thus, I won't
> use anything other than /O2 for c++... because, as I said, when you
> are deploying a commercial software, you will have to use flags that
> target the least-common-denominator processor anyway.

I addressed that above: Again, 1) some optimizations, like -ffast-math
or -funroll-loops (and be careful, of course, you can't blindly use
these, you have to make sure they make sense for your code), do not
target a specific processor, and 2) there is no rule that says you
have to target the least-common-denominator processor; and a heavy
duty research application, such as what James Kanze *originally*
cited, would most certainly take advantage of special features on the
target hardware.

Jason

Report this thread to moderator Post Follow-up to this message
Old Post
jason.cipriani@gmail.com
03-24-08 12:12 AM


Sponsored Links




Last Thread Next Thread Next
Pages (12): [1] 2 3 4 5 6 » ... Last »
Search this forum -> 
Post New Thread

C++ archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 10:48 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.