For Programmers: Free Programming Magazines  


Home > Archive > Unix Programming > August 2006 > SIGSEGV in pthread_mutex_lock?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author SIGSEGV in pthread_mutex_lock?
Henrik Goldman

2006-08-14, 7:00 pm

Hi,

I have some an application which is a mixture of C and C++ code running
under Linux x86 (kernel 2.6.9).

I'm receiving a problem with pthread_mutex_lock which seems to do a
segmentation fault.

I use gcc/g++ v3.4.3 for compilation and have enabled "-pthread" for the
compiler and "-lpthread" for the linker.
However it seems that no matter what I do I get this segmentation fault.

The odd part however is that if I turn off all optimizations (disable -O2
or -O1) then it works. So in other words when there are no optimizations it
seems to work as it should.

Most of all it seems that it's an issue of not turning on pthreads correctly
but I'm not sure if there is any way to debug this problem? The code seems
fine as it works on other systems but naturally it's hard to argue.

Since it seems to work with debug but not with optimizations it could be
some subtle thing with uninitialized structures etc. Any ideas?

Thanks in advance.
-- Henrik


Eric Sosman

2006-08-14, 7:00 pm



Henrik Goldman wrote On 08/14/06 15:10,:
> Hi,
>
> I have some an application which is a mixture of C and C++ code running
> under Linux x86 (kernel 2.6.9).
>
> I'm receiving a problem with pthread_mutex_lock which seems to do a
> segmentation fault.
>
> I use gcc/g++ v3.4.3 for compilation and have enabled "-pthread" for the
> compiler and "-lpthread" for the linker.
> However it seems that no matter what I do I get this segmentation fault.
>
> The odd part however is that if I turn off all optimizations (disable -O2
> or -O1) then it works. So in other words when there are no optimizations it
> seems to work as it should.
>
> Most of all it seems that it's an issue of not turning on pthreads correctly
> but I'm not sure if there is any way to debug this problem? The code seems
> fine as it works on other systems but naturally it's hard to argue.
>
> Since it seems to work with debug but not with optimizations it could be
> some subtle thing with uninitialized structures etc. Any ideas?


Look for uninitialized variables, especially uninitialized
stack-resident variables.

Look for "falling off the end" of non-void functions without
executing `return somevalue;'.

Turn up the compiler's warning levels as high as you can.
For gcc, use at least "-Wall -W -O2", and add "-ansi" or
"-std=whatever"; add "-pedantic" if your code is (supposed to
be) clean enough to withstand it.

If all else fails, load the core dump into a debugger and
start looking at the stacks ...

--
Eric.Sosman@sun.com

Henrik Goldman

2006-08-14, 7:00 pm

Thanks for the suggestions. I found the culprit now...

After 6 hours of debugging and code analysis I found:

szString[nPos-1] = '\0';



The problem was that nPos was 0 in the specific case and thus it overwrote 1
byte of the instance pointer which the mutex was used with.

Naturally the question is how I can avoid this another time? I if there
would be an automated way I could include it into a test case I'd be happy
about that.

Thanks in advance.

-- Henrik


Paul Pluzhnikov

2006-08-14, 7:00 pm

"Henrik Goldman" <henrik_goldman@mail.tele.dk> writes:

> After 6 hours of debugging and code analysis I found:
> szString[nPos-1] = '\0';
>
> The problem was that nPos was 0 in the specific case and thus it overwrote 1
> byte of the instance pointer which the mutex was used with.
>
> Naturally the question is how I can avoid this another time?


By careful coding and/or using automated error detection tools.

Valgrind would detect this problem if and only if szString is
dynamically-allocated.

Purify will detect it if szString is dynamically-allocated or global.

g++ 4.x with -fmudflap is supposed to detect this always, but it
doesn't work :-(
(at least it doesn't work for me, see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19319)

Insure++ will detect this regardless of whether szString is dynamic,
global or on stack.

FWIW, I had exactly the same bug recently.
It showed up on only one of 6 platforms we build on, and only in one
"special" build. On all other platforms/builds the bug has survived
for 5 years (it was never observed).
We found it with Insure++ in about 5 minutes.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Henrik Goldman

2006-08-15, 4:00 am

> By careful coding and/or using automated error detection tools.

Thank you for your description of the different tools.
The specific bug was on a stack variable. Thus by overwriting array[-1] I
would overwrite one byte from the previous stack frame which would be
another local variable.

> It showed up on only one of 6 platforms we build on, and only in one
> "special" build. On all other platforms/builds the bug has survived
> for 5 years (it was never observed).
> We found it with Insure++ in about 5 minutes.


Thanks for the suggestion. By your description only insure would help. In my
case it survived almost two years.
I thought also VS2005 had some stack protection? It didn't detect anything
from the tests I ran. I thought microsoft did some security related
enhancements esp by installing "security cookies" to detect overwriting of
stacks.

-- Henrik


Bjorn Reese

2006-08-15, 4:00 am

Paul Pluzhnikov wrote:

> Purify will detect it if szString is dynamically-allocated or global.


It should be added that Purify does detect stack problems (in this case
a stack array bounds write) on some platforms (Solaris for example).

--
mail1dotstofanetdotdk
Paul Pluzhnikov

2006-08-15, 8:01 am

"Henrik Goldman" <henrik_goldman@mail.tele.dk> writes:

>
> Thank you for your description of the different tools.


There is one (newer) tool I forgot to mention: coverity.com ...
Their claim to fame is that they find bugs by static code analysis,
so you don't even need a test case that "exercises" the bug.

> The specific bug was on a stack variable. Thus by overwriting array[-1] I
> would overwrite one byte from the previous stack frame which would be
> another local variable.


It is extremely unlikely that "array[-1]" belongs to previous frame.
Much more likely it is a "register save area" of the current frame.

> I thought also VS2005 had some stack protection?


It does, but you must enable it with /GS

> I thought microsoft did some security related
> enhancements esp by installing "security cookies" to detect overwriting of
> stacks.


AFAICT, it detects only *some* overwrites past current frame's %esp/%rsp.
Only overwrites of return address, IIRC.
Again, "array[-1]" is somewhat unlikely to extend that far.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Paul Pluzhnikov

2006-08-15, 8:01 am

Bjorn Reese <breese@see.signature> writes:

> It should be added that Purify does detect stack problems (in this
> case a stack array bounds write) on some platforms (Solaris for
> example).


It does detect writes beyond stack pointer.

So, on a machine with stack growing down, "array[-1000]" will very
likely be detected (if there are no other large arrays in the
current frame), but "array[-1]" and "array[100]" (where size of
array is smaller then 100) almost certainly will not be.

Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com