Code Comments
Programming Forum and web based access to our favorite programming groups.Does anyone know of a good code profiller for Linux? I need to know where a program is running slowly. all the best, jeff
Post Follow-up to this messageLinuxAsm <spamtrap@crayne.org> wrote in part: > Does anyone know of a good code profiller for Linux? > I need to know where a program is running slowly. `gprof` -- Robert
Post Follow-up to this messageLinuxAsm <spamtrap@crayne.org> writes: > Does anyone know of a good code profiller for Linux? > I need to know where a program is running slowly. > all the best, jeff gprof Does both active (function level) and passive (sampling) profiling. Phil -- Dear aunt, let's set so double the killer delete select all. -- Microsoft voice recognition live demonstration
Post Follow-up to this messageLinuxAsm wrote: > Does anyone know of a good code profiller for Linux? > I need to know where a program is running slowly. > all the best, jeff "oprofile" (kernel feature) JB
Post Follow-up to this messageLinuxAsm wrote: > Does anyone know of a good code profiler for Linux? I recommend oprofile over gprof. http://oprofile.sourceforge.net/about/
Post Follow-up to this messageOn Mar 20, 10:08 am, Noob <r...@localhost.news.free.fr> wrote: > LinuxAsm wrote: > > I recommend oprofile over gprof. > > http://oprofile.sourceforge.net/about/ Valgrind is extremely good for doing both source level and assembler level profiling (you get execution count for every single instruction!). It also does cache (both data and instruction) and memory allocation profiling. Kcachegrind is the perfect companion for examining and navigating valgrind output. The downside of valgrind is that your program will run very slowly (even 30x) when profiling. But I have found it incomparably more useful than gprof. I have no experience with oprofile. One may also want to look at google's tcmalloc library for memory allocation profiling. HTH, -- gpd
Post Follow-up to this messagegpderetta wrote: > Noob wrote: > > > Valgrind is extremely good for doing both source level > and assembler level profiling (you get execution count > for every single instruction!). > > It also does cache (both data and instruction) and memory > allocation profiling. > > Kcachegrind is the perfect companion for examining and > navigating valgrind output. > > The downside of valgrind is that your program will run very > slowly (even 30x) when profiling. But I have found it incomparably > more useful than gprof. I have no experience with oprofile. If a profiler is too intrusive (high overhead) then it is not profiling the application, but the combination of the application AND the profiler itself. The impact of cache misses will be incorrectly reported because the timing is different, and the profiler itself will induce extraneous data and cache misses. cf. http://en.wikipedia.org/wiki/Observer_effect Regards.
Post Follow-up to this messageOn 24 Mar, 12:34, Noob <r...@localhost.news.free.fr> wrote: > gpderetta wrote: > > > > > > > > > If a profiler is too intrusive (high overhead) then it is not profiling > the application, but the combination of the application AND the profiler > itself. The impact of cache misses will be incorrectly reported because > the timing is different, and the profiler itself will induce extraneous > data and cache misses. > > cf.http://en.wikipedia.org/wiki/Observer_effect > Well, valgrind doesn't compute profiling information by timing each instruction as executed by the actual cpu, but it fully emulates a virtual cpu, cache included. This cpu is completely dedicated to the execution of the profiled program, the profiler doesn't run inside it, so it can't influence the results: for example, cache misses are computed as seen from the virtual cache, not the real cache. This is where the 30x cost comes from. Of course what you get is not the profile of the program running on the real cpu, but rather on the virtual cpu. There are differences in fact: in the assence of cache misses, the cost of every instruction is, IIRC, always a single clock cycle: a simple sum will be shown to cost as much as a long latency division. Also there is no branch prediciton emulation. Differences in instruction cost are only due to cache miss extimation. Still what you get is a very precise idea of what are the most executed modules, functions and even instrucitons in your program. Also you can play with the cache settings and even profile your program as if it where running on a different cpu than your actual machine. -- gpd
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.