Home > Archive > Unix Programming > February 2008 > Sun Studio code hangs at fork, GCC code runs fine
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Sun Studio code hangs at fork, GCC code runs fine
|
|
|
| I am encountering a strange problem where code complied with GCC runs
fine,
but the same code compiled with Sun Studio 12 compiler hangs at a fork
call
after an initial set of fork calls.
I'd appreciate your help in solving this.
The code was originally written for GCC, I started using the Sun
Studio with a view to
using Sun's thread analyzer. I made a couple of changes which I think
are inoccuous:
(a) added #ifdef __SUNPRO_CC char * __FUNCTION__ = "name"; #endif
(b) one more #ifdef around a call to ctime_r because the g++ include
expects 2 parameters
while Sun Studio 12 expects 3 parameters.
The g++ is compiled on a Solaris 9 (1 x sparcv9) using GCC 3.3.3
while the Sun Studio 12 is on a Solaris 10 ( 4 x sparcv9).
Both runs were on the same Solaris 10 machine which has the compiler.
The program is a multi instance socket listener, the main program
binds to a socket,
then forks off a set of processes which accept connections, then they
fork off other
client applications based on the connection data; after those clients
exit, the children
continue accepting etc. The main program only monitors the child
processes,
records their status and starts if any of those die.
This is working as expected when compiled with g++.
The exact same code (note the two sets of #ifdefs above) when compiled
with Sun PRO CC
alway hangs at the second set of fork. That is, it goes through the
first set of forks fine,
then when a connection is established, it reads the data but hangs
right at the fork call.
The application code at this point is:
char * p = pathfind( getenv("PATH"), progname );
if ( !p ) { printError(); return false; }
pid = fork();
....
The truss for the Sun Studio code shows
23914: access("/usr/bin/progname", X_OK) = Err#2 ENOENT
23914: access("/usr/you/bin/progname", X_OK) = 0
23914: lwp_park(0x0000000, 0) (sleeping)
while the truss for the g++ code shows
23986: access("/usr/bin/progname", X_OK) = Err#2 ENOENT
23986: access("/usr/you/bin/progname", X_OK) = 0
23996: fork1() = 24005
What am I doing wrong?
Thanks and best regards
| |
| Frank Cusack 2008-02-08, 7:17 pm |
| On Thu, 7 Feb 2008 17:03:28 -0800 (PST) A <ad_101@yahoo.com> wrote:
> (a) added #ifdef __SUNPRO_CC char * __FUNCTION__ = "name"; #endif
i assume the above is just for brevity and you don't actually have
a semicolon as part of the macro. you might want to unconditionally
#define __FUNCTION__ __func__
instead, and then you'll get the actual function name in sunpro C as
well as g++.
> (b) one more #ifdef around a call to ctime_r because the g++ include
> expects 2 parameters
> while Sun Studio 12 expects 3 parameters.
because you didn't #define _POSIX_PTHREAD_SEMANTICS. See ctime_r(3c).
> The g++ is compiled on a Solaris 9 (1 x sparcv9) using GCC 3.3.3
> while the Sun Studio 12 is on a Solaris 10 ( 4 x sparcv9).
>
> Both runs were on the same Solaris 10 machine which has the compiler.
that's a mistake unless you re-ran fix-includes (I've never seen a
gcc or g++ package that did this correctly).
> The program is a multi instance socket listener, the main program
> binds to a socket, then forks off a set of processes which accept
> connections, then they fork off other client applications based on
> the connection data; after those clients exit, the children continue
> accepting etc. The main program only monitors the child processes,
> records their status and starts if any of those die.
>
> This is working as expected when compiled with g++.
>
> The exact same code (note the two sets of #ifdefs above) when
> compiled with Sun PRO CC alway hangs at the second set of fork. That
> is, it goes through the first set of forks fine, then when a
> connection is established, it reads the data but hangs right at the
> fork call.
>
> The application code at this point is:
> char * p = pathfind( getenv("PATH"), progname );
> if ( !p ) { printError(); return false; }
> pid = fork();
> ....
>
> The truss for the Sun Studio code shows
>
> 23914: access("/usr/bin/progname", X_OK) = Err#2 ENOENT
> 23914: access("/usr/you/bin/progname", X_OK) = 0
> 23914: lwp_park(0x0000000, 0) (sleeping)
>
> while the truss for the g++ code shows
>
> 23986: access("/usr/bin/progname", X_OK) = Err#2 ENOENT
> 23986: access("/usr/you/bin/progname", X_OK) = 0
> 23996: fork1() = 24005
>
> What am I doing wrong?
most interesting. my first guess would of course normally be that
something is wrong with your code, but i'm having a hard time coming
up with a possible error that would work in one case but not the
other. your program isn't by any chance threaded is it? you didn't
say so, but since you did post to c.p.threads i figure that might be
the case. in threaded programs, you are not allowed to call any
async-signal-UNsafe functions in the child after fork(). note that in
S10 all programs are threaded, although i don't know if this rule
applies in that case (S10 does do some things differently if you have
only 1 thread vs multithreaded).
my first guess is that between fork() and exec() you are calling
an unsafe function, and that you have broken fixed includes in
your g++ installation which hides the problem by entering libc
differently. either that or not defining _POSIX_PTHREAD_SEMANTICS
is affecting g++ vs sunpro C differently.
see also fork(2) for info about fork() behavior on S10.
-frank
| |
|
| On Feb 8, 3:34=A0pm, Frank Cusack <fcus...@fcusack.com> wrote:
> On Thu, 7 Feb 2008 17:03:28 -0800 (PST) A <ad_...@yahoo.com> wrote:
>
endif[color=darkred]
>
> i assume the above is just for brevity and you don't actually have
> a semicolon as part of the macro. =A0you might want to unconditionally
>
> #define __FUNCTION__ __func__
>
> instead, and then you'll get the actual function name in sunpro C as
> well as g++.
>
>
> because you didn't #define _POSIX_PTHREAD_SEMANTICS. =A0See ctime_r(3c).
>
>
>
> that's a mistake unless you re-ran fix-includes (I've never seen a
> gcc or g++ package that did this correctly).
>
>
[[CLIP]]
>
>
> most interesting. =A0my first guess would of course normally be that
> something is wrong with your code, but i'm having a hard time coming
> up with a possible error that would work in one case but not the
> other. =A0your program isn't by any chance threaded is it? =A0you didn't
> say so, but since you did post to c.p.threads i figure that might be
> the case. =A0in threaded programs, you are not allowed to call any
> async-signal-UNsafe functions in the child after fork(). =A0note that in
> S10 all programs are threaded, although i don't know if this rule
> applies in that case (S10 does do some things differently if you have
> only 1 thread vs multithreaded).
>
> my first guess is that between fork() and exec() you are calling
> an unsafe function, and that you have broken fixed includes in
> your g++ installation which hides the problem by entering libc
> differently. =A0either that or not defining _POSIX_PTHREAD_SEMANTICS
> is affecting g++ vs sunpro C differently.
>
> see also fork(2) for info about fork() behavior on S10.
>
> -frank
Thanks for your message.
Yes, the first macro is actully in 3 lines, I wrote it in 1 line to
make the post shorter.
Thanks for your tip on __func__, I wasn't aware of the features
extentions, will use that instead.
And will also include the def you said for ctime_r.
Yes, the program uses pthreads, though it runs only the main thread
for most of the time.
Perhaps it is using some non-safe function at some time and maybe that
is causing the different behaviour in the SunPRO runtime.
The GCC install is old and it probably doesn't have fix-includes.
The program calls some old libraries which were not written for
threads, however, either the logic of the program is such that such
calls are made when the program only has the main thread or the calls
are serialized by use of mutexes in the new code. Of course the entire
code base was re-compiled with the Sun PRO CC using the same compiler
flags (incl -mt).
Will recompile using the _POSIX_PTHREAD_SEMANTICS and see if it makes
any difference.
Thanks.
| |
|
| I posted this to Sun.com forum also and one poster there said:
// begin quote
you might be running into OS issues and not compiler issues.........
By "OS issues", I mean
1. The program uses unstable or undocumented OS interfaces, or
2. A bug that is new in Solaris 10.
A bug was identified in Solaris 10 that could cause forks to hang. It
showed up in the libpkcs11 library as used by the JVM. I don't know
whether it showed up elsewhere.
This bug was fixed in Solaris 10u2. If you are using an earlier
version of Solaris 10, you could try upgrading.
// end quote
Originally I was running the code on Solaris 10, now I ran both the
binaries on Solaris 9 and both run fine.
IOW, code compiled with SunPRO on Solaris 10 is NOT hanging on Solaris
9 and proceeding with the fork as expected. Please note that I have
not yet recompiled the code since my original post.
Is there a patch that I should install on the Solaris 10? Running
uname -a prints
SunOS host1 5.10 Generic_118833-33 sun4u sparc SUNW,UltraAX-MP
It could also be that there's something really wrong with my code,
maybe unsafe functions in the older libraries.
Best regards
|
|
|
|
|