| Author |
SIGSEGV in pthread_mutex_lock?
|
|
| Henrik Goldman 2006-08-14, 7:00 pm |
| Hi,
I have some an application which is a mixture of C and C++ code running
under Linux x86 (kernel 2.6.9).
I'm receiving a problem with pthread_mutex_lock which seems to do a
segmentation fault.
I use gcc/g++ v3.4.3 for compilation and have enabled "-pthread" for the
compiler and "-lpthread" for the linker.
However it seems that no matter what I do I get this segmentation fault.
The odd part however is that if I turn off all optimizations (disable -O2
or -O1) then it works. So in other words when there are no optimizations it
seems to work as it should.
Most of all it seems that it's an issue of not turning on pthreads correctly
but I'm not sure if there is any way to debug this problem? The code seems
fine as it works on other systems but naturally it's hard to argue.
Since it seems to work with debug but not with optimizations it could be
some subtle thing with uninitialized structures etc. Any ideas?
Thanks in advance.
-- Henrik
| |
| Eric Sosman 2006-08-14, 7:00 pm |
|
Henrik Goldman wrote On 08/14/06 15:10,:
> Hi,
>
> I have some an application which is a mixture of C and C++ code running
> under Linux x86 (kernel 2.6.9).
>
> I'm receiving a problem with pthread_mutex_lock which seems to do a
> segmentation fault.
>
> I use gcc/g++ v3.4.3 for compilation and have enabled "-pthread" for the
> compiler and "-lpthread" for the linker.
> However it seems that no matter what I do I get this segmentation fault.
>
> The odd part however is that if I turn off all optimizations (disable -O2
> or -O1) then it works. So in other words when there are no optimizations it
> seems to work as it should.
>
> Most of all it seems that it's an issue of not turning on pthreads correctly
> but I'm not sure if there is any way to debug this problem? The code seems
> fine as it works on other systems but naturally it's hard to argue.
>
> Since it seems to work with debug but not with optimizations it could be
> some subtle thing with uninitialized structures etc. Any ideas?
Look for uninitialized variables, especially uninitialized
stack-resident variables.
Look for "falling off the end" of non-void functions without
executing `return somevalue;'.
Turn up the compiler's warning levels as high as you can.
For gcc, use at least "-Wall -W -O2", and add "-ansi" or
"-std=whatever"; add "-pedantic" if your code is (supposed to
be) clean enough to withstand it.
If all else fails, load the core dump into a debugger and
start looking at the stacks ...
--
Eric.Sosman@sun.com
| |
| Henrik Goldman 2006-08-14, 7:00 pm |
| Thanks for the suggestions. I found the culprit now...
After 6 hours of debugging and code analysis I found:
szString[nPos-1] = '\0';
The problem was that nPos was 0 in the specific case and thus it overwrote 1
byte of the instance pointer which the mutex was used with.
Naturally the question is how I can avoid this another time? I if there
would be an automated way I could include it into a test case I'd be happy
about that.
Thanks in advance.
-- Henrik
| |
| Paul Pluzhnikov 2006-08-14, 7:00 pm |
| "Henrik Goldman" <henrik_goldman@mail.tele.dk> writes:
> After 6 hours of debugging and code analysis I found:
> szString[nPos-1] = '\0';
>
> The problem was that nPos was 0 in the specific case and thus it overwrote 1
> byte of the instance pointer which the mutex was used with.
>
> Naturally the question is how I can avoid this another time?
By careful coding and/or using automated error detection tools.
Valgrind would detect this problem if and only if szString is
dynamically-allocated.
Purify will detect it if szString is dynamically-allocated or global.
g++ 4.x with -fmudflap is supposed to detect this always, but it
doesn't work :-(
(at least it doesn't work for me, see
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=19319)
Insure++ will detect this regardless of whether szString is dynamic,
global or on stack.
FWIW, I had exactly the same bug recently.
It showed up on only one of 6 platforms we build on, and only in one
"special" build. On all other platforms/builds the bug has survived
for 5 years (it was never observed).
We found it with Insure++ in about 5 minutes.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| Henrik Goldman 2006-08-15, 4:00 am |
| > By careful coding and/or using automated error detection tools.
Thank you for your description of the different tools.
The specific bug was on a stack variable. Thus by overwriting array[-1] I
would overwrite one byte from the previous stack frame which would be
another local variable.
> It showed up on only one of 6 platforms we build on, and only in one
> "special" build. On all other platforms/builds the bug has survived
> for 5 years (it was never observed).
> We found it with Insure++ in about 5 minutes.
Thanks for the suggestion. By your description only insure would help. In my
case it survived almost two years.
I thought also VS2005 had some stack protection? It didn't detect anything
from the tests I ran. I thought microsoft did some security related
enhancements esp by installing "security cookies" to detect overwriting of
stacks.
-- Henrik
| |
| Bjorn Reese 2006-08-15, 4:00 am |
| Paul Pluzhnikov wrote:
> Purify will detect it if szString is dynamically-allocated or global.
It should be added that Purify does detect stack problems (in this case
a stack array bounds write) on some platforms (Solaris for example).
--
mail1dotstofanetdotdk
| |
| Paul Pluzhnikov 2006-08-15, 8:01 am |
| "Henrik Goldman" <henrik_goldman@mail.tele.dk> writes:
>
> Thank you for your description of the different tools.
There is one (newer) tool I forgot to mention: coverity.com ...
Their claim to fame is that they find bugs by static code analysis,
so you don't even need a test case that "exercises" the bug.
> The specific bug was on a stack variable. Thus by overwriting array[-1] I
> would overwrite one byte from the previous stack frame which would be
> another local variable.
It is extremely unlikely that "array[-1]" belongs to previous frame.
Much more likely it is a "register save area" of the current frame.
> I thought also VS2005 had some stack protection?
It does, but you must enable it with /GS
> I thought microsoft did some security related
> enhancements esp by installing "security cookies" to detect overwriting of
> stacks.
AFAICT, it detects only *some* overwrites past current frame's %esp/%rsp.
Only overwrites of return address, IIRC.
Again, "array[-1]" is somewhat unlikely to extend that far.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
| |
| Paul Pluzhnikov 2006-08-15, 8:01 am |
| Bjorn Reese <breese@see.signature> writes:
> It should be added that Purify does detect stack problems (in this
> case a stack array bounds write) on some platforms (Solaris for
> example).
It does detect writes beyond stack pointer.
So, on a machine with stack growing down, "array[-1000]" will very
likely be detected (if there are no other large arrays in the
current frame), but "array[-1]" and "array[100]" (where size of
array is smaller then 100) almost certainly will not be.
Cheers,
--
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
|
|
|
|