Home > Archive > Unix Programming > June 2007 > Fixing hard-to-reproduce bugs
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Fixing hard-to-reproduce bugs
|
|
|
| Hello,
I have an app that runs "forever" in a while ( 1 ) loop handling
incoming packets at 40 Mbit/s. It runs as expected when I start it, and
continues to run as expected for the next several hours.
However, after what seems to be a random time (between 30 and 90 hours)
its memory size starts growing until there is no more memory available.
How do you debug hard-to-reproduce bugs that only occur after several
days of run-time?
I've been pulling my hair over this bug for a few w s now ;-(
Regards.
| |
| Rainer Temme 2007-06-18, 7:06 pm |
| Spoon wrote:
> However, after what seems to be a random time (between 30 and 90 hours)
> its memory size starts growing until there is no more memory available.
What type of memory is growing?
- stack ... the program is eventually digging into an
endless recursion.
- heap ... use a malloc-checker (electric fence and the like).
- code ... are you loading shared libraries dynamically.
On Linux systems (or others with /proc) /proc/_pid_/status
migh help to find out.
Also, while you process is hogging mermory, does it
still process all arriving data, or is it already
failing to do so?
| |
|
| Rainer Temme wrote:
> Spoon wrote:
>
>
> What type of memory is growing?
I used very crude means to monitor memory usage: I left top running in a
shell and sorted by resident set size. This will lump stack and heap
size, but should not count shared libraries, unless I am mistaken.
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4027
virtual memory (kbytes, -v) unlimited
Since stack size is limited to 8 MB, and since my process consumes more
than 250 MB once I tickle the bug, I would have to assume that I am a
consuming (either leaking or over-allocating) heap.
> - stack ... the program is eventually digging into an
> endless recursion.
There is no explicit recursion in my code.
I'll double check for any implicit recurstion.
> - heap ... use a malloc-checker (electric fence and the like).
I tried valgrind but the CPU cannot handle both valgrind and my process
at the same time. Are you suggesting setting MALLOC_CHECK_ perhaps? As
far as I understand, this will not help detect memory leaks, right?
> - code ... are you loading shared libraries dynamically.
I don't use dlopen() and co.
> On Linux systems (or others with /proc) /proc/_pid_/status
> might help to find out.
I'm pretty sure this is a heap issue. However, I have no idea what
triggers the bug, and where in the code memory is consumed.
> Also, while you process is hogging memory, does it
> still process all arriving data, or is it already
> failing to do so?
Hard to say.
Regards.
| |
| Bin Chen 2007-06-18, 7:06 pm |
| On Jun 18, 9:01 pm, Spoon <root@localhost> wrote:
> Rainer Temme wrote:
>
>
>
> I used very crude means to monitor memory usage: I left top running in a
> shell and sorted by resident set size. This will lump stack and heap
> size, but should not count shared libraries, unless I am mistaken.
>
> $ ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> file size (blocks, -f) unlimited
> max locked memory (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files (-n) 1024
> pipe size (512 bytes, -p) 8
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited
> max user processes (-u) 4027
> virtual memory (kbytes, -v) unlimited
>
> Since stack size is limited to 8 MB, and since my process consumes more
> than 250 MB once I tickle the bug, I would have to assume that I am a
> consuming (either leaking or over-allocating) heap.
>
>
> There is no explicit recursion in my code.
> I'll double check for any implicit recurstion.
>
>
> I tried valgrind but the CPU cannot handle both valgrind and my process
> at the same time. Are you suggesting setting MALLOC_CHECK_ perhaps? As
> far as I understand, this will not help detect memory leaks, right?
Yes, MALLOC_CHECK_ will not report anything about the memory leak, but
I also strongly you use it, may be you can find some other interesting
because memory corruption may cause many strange behavior, the result
is just *unpredictable*.
| |
|
| On Mon, 18 Jun 2007 15:01:37 +0200, Spoon wrote:
> Rainer Temme wrote:
>
Almost certainly heap. Also note that mmap() (if you use it) also adds up
to your VSIZE.
If your application has a logfile, try to correlate the 30-90 hours with
a specific event. Maybe some code that is only used once a certain
condition is true (eg overflow of hashtables). Or the inability to find
stored items, causing them to be stored again and again.
(lack of) Deallocation of "nested" datastructures is also a probable
candidate. Adding a counter for each structure-type and incrementing /
decrementing it in their "constructor" / "destructor", and dumping the
scoreboard to a file periodically is IMHO a good way to monitor your
programs behavior. For testing purposes, it will be handy if you can
artifically lower the resource-limits, or allocated sizes.
[color=darkred]
> I don't use dlopen() and co.
Wild guess: fdopen() ???
>
> Hard to say.
If you cannot tell whether your program works or not, how could you try to
improve it ?
HTH,
AvK
|
|
|
|
|