Code Comments
Programming Forum and web based access to our favorite programming groups.Dear Unix Experts, I have a bug somewhere that causes my code to consume all available CPU time, but it does it in a bizzare way. When I look at the output of "top", I see that overall CPU time is approximately 30% user and 70% system, and the load average is just over 1. But none of the listed processes shows any significant %CPU: no process "owns" the excessive processing time. Of course I know which process is responsible for it - it's the code I'm currently hacking, and killing it does return things to normal. I suspect that the problem is a loop that is calling select() over and over again or something like that, and I'll eventually track it down. But the fact that this CPU time is not being attributed to it is mysterious, and I wonder if it is giving me a clue about what is going wrong. Has anyone else ever seen this behaviour? This is with Linux, 2.6.3 kernel. Cheers, Phil.
Post Follow-up to this messagephil_gg04@treefic.com writes: > Dear Unix Experts, > > I have a bug somewhere that causes my code to consume all available CPU > time, but it does it in a bizzare way. When I look at the output of > "top", I see that overall CPU time is approximately 30% user and 70% > system, and the load average is just over 1. Sounds like it's stuck in a loop doing system calls repeatedly. > But none of the listed processes shows any significant %CPU: no > process "owns" the excessive processing time. > > Of course I know which process is responsible for it - it's the code > I'm currently hacking, and killing it does return things to normal. I > suspect that the problem is a loop that is calling select() over and > over again or something like that, and I'll eventually track it down. > But the fact that this CPU time is not being attributed to it is > mysterious, and I wonder if it is giving me a clue about what is going > wrong. Has anyone else ever seen this behaviour? > > This is with Linux, 2.6.3 kernel. The behavior you describe is normal with old versions of "top" and an NPTL enabled kernel/libc. What happens is that another thread than the first is using the CPU time, and the tools don't know where to look to find out. Upgrade your procps package and/or try "ps ux -T". That command will list all the threads of each process, and should show the CPU usage correctly. To find the actual bug, try using strace to pinpoint the exact system call. If you're lucky, it's one that's called from few places in your code. -- Måns Rullgård mru@inprovide.com
Post Follow-up to this message>> overall CPU time is approximately 30% user and 70% system > The behavior you describe is normal with old versions of "top" and an > NPTL enabled kernel/libc. What happens is that another thread than > the first is using the CPU time, and the tools don't know where to > look to find out. Ah, an instrumentation failure! Thanks M=E5ns, I'll upgrade and see if that reports it properly. This is indeed a multi-threaded application. Cheers, Phil.
Post Follow-up to this messageOn Mon, 30 May 2005 06:49:33 -0700, phil_gg04 wrote: > I have a bug somewhere that causes my code to consume all available CPU > time, but it does it in a bizzare way. When I look at the output of > "top", I see that overall CPU time is approximately 30% user and 70% > system, and the load average is just over 1. But none of the listed > processes shows any significant %CPU: no process "owns" the excessive > processing time. There is another possible explanation than the instrumentation defect suggested by Måns Rullgård. The scenario you describe may be due to short lived processes. If processes begin and terminate in less than the sample time of the monitoring tools the CPU time they consume can be hard to account for. The symptoms you report suggests that tasks (e.g., processes or threads) are being spawned and almost immediately exiting.
Post Follow-up to this message>> the load average is just over 1. But none of the listed > may be due to short lived processes Thanks Kurtis, that's possible. But I don't think it's the problem in my case, because looking at the process IDs allocated to new processes they are only increasing at a sensible rate. If there were short-lived processes I'd see big gaps between the PIDs of other processes. --Phil.
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.