For Programmers: Free Programming Magazines  


Home > Archive > A86 Assembler > September 2004 > Writing simple profiler









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Writing simple profiler
Glen Able

2004-09-17, 8:55 am

I'd like to knock together a simple profiler to use with some of my
VC6-based apps (my version of VC not having a profiler, apparently).

I'm a bit stumped on the 2 main parts of this, i.e.
a) How to set up an interrupt with a given frequency
b) How, in the interrupt handler, to grab the relevant data and exit without
breaking anything. I basically just need to grab the PC value, but can I be
sure that it is currently in my app's process?

Thanks for any clues you can give me.


Peter Koch Larsen

2004-09-17, 8:56 pm


"Glen Able" <spamtrap@crayne.org> skrev i en meddelelse
news:cgkdvj$i1s$1$8302bc10@news.demon.co.uk...
> I'd like to knock together a simple profiler to use with some of my
> VC6-based apps (my version of VC not having a profiler, apparently).
>
> I'm a bit stumped on the 2 main parts of this, i.e.
> a) How to set up an interrupt with a given frequency
> b) How, in the interrupt handler, to grab the relevant data and exit

without
> breaking anything. I basically just need to grab the PC value, but can I

be
> sure that it is currently in my app's process?
>
> Thanks for any clues you can give me.
>
>

Just out of curiosity, Glen. What parts are you not stumped on?

/Peter

Robert Wessel

2004-09-18, 3:55 am

"Glen Able" <spamtrap@crayne.org> wrote in message news:<cgkdvj$i1s$1$8302bc10@news.demon.co.uk>...
> I'd like to knock together a simple profiler to use with some of my
> VC6-based apps (my version of VC not having a profiler, apparently).
>
> I'm a bit stumped on the 2 main parts of this, i.e.
> a) How to set up an interrupt with a given frequency
> b) How, in the interrupt handler, to grab the relevant data and exit without
> breaking anything. I basically just need to grab the PC value, but can I be
> sure that it is currently in my app's process?
>
> Thanks for any clues you can give me.



Simple, huh? OK add a high priority thread to your application that
blocks for a bit, then does a SuspendThread on the thread you want to
profile, then get the current eip value with GetThreadContext, and
then does a ResumeThread. A high resolution timer (eg. the multimedia
timers) with a callback function are probably a good way to drive this
process.

Matt

2004-09-19, 3:55 am

"Glen Able" <spamtrap@crayne.org> wrote in message
news:cgkdvj$i1s$1$8302bc10@news.demon.co.uk...
> I'd like to knock together a simple profiler to use with some of my
> VC6-based apps (my version of VC not having a profiler, apparently).
>
> I'm a bit stumped on the 2 main parts of this, i.e.
> a) How to set up an interrupt with a given frequency
> b) How, in the interrupt handler, to grab the relevant data and exit
> without
> breaking anything. I basically just need to grab the PC value, but can I
> be
> sure that it is currently in my app's process?


That's not a terribly great way to accomplish what you want. You can grab a
number of samples of the instruction pointer, but how is that going to give
you useful information? If you get a large number of samples around the same
EIP value, then you've probably found the hotspot. That's not really good
measurement, though.

Depending on your needs, this solution may work better. I created a class
which times a function in cycles. The idea is that the compiler will call
the destructor automatically at the end of the scope, so you can instantiate
the class in a for loop to obtain the amount of time spend in the for loop.
(This gives you more information than the above -- you know whether the loop
is taking all the time or your function is simply getting called often.)

The code looks something like this:
struct PROFILECTR
{
unsigned __int64 ticks;
};

template<PROFILECTR *ctr>
class PROFILER
{
PROFILER()
{
_asm
{
rdtsc
mov DWORD PTR [start], eax
mov DWORD PTR [start+4], edx
}
}

~PROFILER()
{
_asm
{
rdtsc
sub eax, DWORD PTR [start]
sbb edx, DWORD PTR [start+4]

add DWORD PTR [ctr], eax
adc DWORD PTR [ctr], edx
}
}

unsigned long long start;
};

Usage of it goes like this:
PROFILECTR ctr;

void TimeMe(void)
{
PROFILER<&ctr> p;

// ...
}

I ended up using macros to vastly simplify things.

-Matt

Markus Humm

2004-09-19, 3:55 pm

[snip]

If you're under Windows, I think you won't use an interrupt directly.
You'd use timers instead...

Greetings

Markus

Ivan Korotkov

2004-09-21, 3:56 am

> I don't think so. Timers are processed in normal Windows code, so there
> is
> no way to figure out what you were doing when the timer went off.
>
> Interrupts are the answer, but you need kernel help to do it.


For a *simple* profiler, debugging API are a better answer.

--
Ivan

e-mail me at: korotkov2 at ztel dot ru

Glen Able

2004-09-22, 3:55 am


"Matt" <spamtrap@crayne.org> wrote in message
news:XR63d.40267$uN5.6831@tornado.tampabay.rr.com...
> "Glen Able" <spamtrap@crayne.org> wrote in message
> news:cgkdvj$i1s$1$8302bc10@news.demon.co.uk...
I[color=darkred]
>
> That's not a terribly great way to accomplish what you want. You can grab

a
> number of samples of the instruction pointer, but how is that going to

give
> you useful information? If you get a large number of samples around the

same
> EIP value, then you've probably found the hotspot. That's not really good
> measurement, though.


Made me jump to see my post finally appear - I sent it on 26 August!

Anyway, sampling EIP a few tens of thousand times a second should give a
pretty accurate idea of what percentage of the time my code's spending in
every function. Of course, to convert the saved EIP values to something
useful, I'll need to parse the map file VC creates, and do some
name-undecorating.

> Depending on your needs, this solution may work better. I created a class
> which times a function in cycles. The idea is that the compiler will call
> the destructor automatically at the end of the scope, so you can

instantiate
> the class in a for loop to obtain the amount of time spend in the for

loop.
> (This gives you more information than the above -- you know whether the

loop
> is taking all the time or your function is simply getting called often.)
>
> The code looks something like this:


<snipped code>

> I ended up using macros to vastly simplify things.


I briefly considered manually instrumenting the code like you say, but was
really hoping to avoid having to modify the app at all (although, of course,
I'd have to do something like this to get nice, hierarchical profiling).
Your solution is very neat, though - esp. the way you exploit the scoping.
Btw, another reason I prefer my sampling approach is that it should be less
intrusive where I'm using lots of tiny inlined (hopefully) methods. Anyway,
this is only a spare-time project, so will carry on thinking for a bit...

thanks Matt!
g.a.


Robert Wessel

2004-09-22, 3:55 am

OK, it looked like an interesting execise, and I was bored
watching a big build. Seems to work fairly well on the
sample program. Not fancy, but...

Separate the three following source files, and compile with
something like:

cl calce.c sprofile.c -W3 -WX -MT -Fc -link -Map

Run sample program "calce". Profile data spit out at end.
"e" computed to 20000 decimal places (to increase runtime,
increase EDIGITS in calce.c).



******sprofile.h
void sprof_start(void *start, void *end, long nbuckets, HANDLE hthread);
void sprof_end(void);


******sprofile.c
// Simple Profiler
// Start profiling with sprof_start, passing the range of addresses in which
// samples are to be taken, the number of sample buckets into which this area
// should be divided, and the handle to the thread which is to be sampled
// (null = calling thread). Finish with sprof_end.

// More Windows, x86 and 32 bit dependencies than you can shake a stick at.

#include <windows.h>
#include <process.h>
#include <stdio.h>
#include "sprofile.h"

// Globals
unsigned long *buckets;
unsigned long startaddress;
unsigned long endaddress;
unsigned long bucketsize;
long bucketcount;
HANDLE hprofilethread;
HANDLE hcallbackthread;

// Some quick and dirty semaphores
volatile int profilethreadrunning = 0;
volatile int profilestoppending = 0;
volatile int profilethreadstopped = 0;


DWORD WINAPI sprof_thread(LPVOID p1);
void CALLBACK sprof_callback(UINT wTimerID, UINT msg,
DWORD dwUser, DWORD dw1, DWORD dw2);
void sprof_error(char *locus);


void sprof_start(void *start, void *end, long nbuckets, HANDLE hthread)
{

HRESULT rc;

// If no thread to profile is passed, profile current thread
if (!hthread)
hthread = OpenThread(THREAD_ALL_ACCESS, 0, GetCurrentThreadId());

buckets = calloc(nbuckets+1, sizeof(long));
if (!buckets)
sprof_error("Calloc failed");

startaddress = (unsigned long)start; // yeah, yeah, address converted to long
endaddress = (unsigned long)end;
bucketcount = nbuckets;
bucketsize = (endaddress-startaddress+bucketcount-1)/bucketcount;
hprofilethread = hthread;

rc = timeBeginPeriod(1); // Try to set 1ms period
if (rc != TIMERR_NOERROR)
sprof_error("BeginPeriod failed");

hcallbackthread = (HANDLE)_beginthreadex(NULL, 0, sprof_thread, NULL, 0, NULL);
if (!hcallbackthread)
sprof_error("_beginthreadex failed");
if (!SetThreadPriority(hcallbackthread, THREAD_PRIORITY_HIGHEST))
sprof_error("SetThreadPriority failed");

for (;;)
{
Sleep(100);
if (profilethreadrunning)
break;
}
}


DWORD WINAPI sprof_thread(LPVOID p1)
{
HRESULT rc;
HRESULT htimer;

htimer = timeSetEvent(1, 1, sprof_callback, 0,
TIME_PERIODIC | TIME_CALLBACK_FUNCTION);
if (!htimer)
sprof_error("timeSetEvent failed");

profilethreadrunning = 1;

for (;;)
{
Sleep(250);
if (profilestoppending)
break;
}

rc = timeKillEvent(htimer);
if (rc != TIMERR_NOERROR)
sprof_error("timeKillEvent failed");

profilethreadstopped = 1;

return (0);
}


void CALLBACK sprof_callback(UINT wTimerID, UINT msg,
DWORD dwUser, DWORD dw1, DWORD dw2)
{
HRESULT rc;
CONTEXT threadcontext;
unsigned long addr;

// An open question is whther or not this routine can be called
// reentrantly by the multi-media timer support.

rc = SuspendThread(hprofilethread);
if (rc == 0xffffffff)
sprof_error("SuspendThread failed");

threadcontext.ContextFlags = CONTEXT_i386 | CONTEXT_CONTROL;
rc = GetThreadContext(hprofilethread, &threadcontext);
if (!rc)
sprof_error("GetThreadContext failed");

addr = threadcontext.Eip;
if (startaddress <= addr && endaddress >= addr)
{
addr -= startaddress;
addr /= bucketsize;
buckets[addr]++;
}

rc = ResumeThread(hprofilethread);
if (!rc)
sprof_error("Resume Thread failed");
}


void sprof_end(void)
{
long i;
unsigned long addr;

profilestoppending = 1;
for (;;)
{
Sleep(100);
if (profilethreadstopped)
break;
}


printf("\n\nBuckets (%i bytes each):\n", bucketsize);
addr = startaddress;
for (i=0; i<bucketcount; i++)
{
printf(" %08x(+%04x): %i\n", addr, (addr-startaddress), buckets[i]);
addr += bucketsize;
}

printf("\n\n");

free(buckets); // somebody check this return code
}



void sprof_error(char *locus)
{
char msg[256];
strcpy(msg, "sprofile: ");
strcat(msg, locus);
FatalAppExit(0, msg);
}


******calce.c
// Calculate E via the standard Taylor series
#include <windows.h>
#include <stdio.h>
#include "sprofile.h"


#define EDIGITS 5050
#define ELOGBASE 4
#define EBASE 10000 //10**ELOGBASE


unsigned long esum[EDIGITS], eterm[EDIGITS];


void propcarry(void);
void lastfunc(void);


int main(void)
{
unsigned long firstsig, edivisor;
int i, j;
FILE *outfile;

// Profile between main and lastfunc, subdivide into 100 buckets,
// sample this thread.
sprof_start(main, lastfunc, 100, NULL);

printf("Calculating E to %i digits \n\n\n", (int) EDIGITS*4);

for(i=1; i<EDIGITS; i++)
{
esum[i]=0;
eterm[i]=0;
}

esum[0]=2*EBASE/10;
eterm[0]=EBASE/10;
edivisor=2;
firstsig=0;

while(firstsig < EDIGITS)
{
if((edivisor%25) ==0)
printf("\x1b[A%8i %8i\n", edivisor, firstsig*ELOGBASE);

{
unsigned long divisorwork, tul;

divisorwork=0;
for(i=firstsig; i<EDIGITS; i++)
{
divisorwork+=eterm[i];
tul=divisorwork/edivisor;
divisorwork=(divisorwork%edivisor)*EBASE
;
esum[i]+=tul;
eterm[i]=tul;
}
}

while(eterm[firstsig] == 0 && firstsig < EDIGITS)
firstsig++;

if((edivisor%EBASE) == 0)
propcarry();

edivisor++;
}
printf("\x1b[A%8i %8i\n\n", edivisor, firstsig*ELOGBASE);

propcarry();

outfile = fopen("calce.out", "w");
fprintf(outfile, "\n");
fprintf(outfile, " ");
fprintf(outfile, "CALCE - Calculate E via Taylor Series\n");
fprintf(outfile, " ");
fprintf(outfile, "-------------------------------------\n");
fprintf(outfile, " ");
fprintf(outfile, " %i digits\n\n", (int) EDIGITS*ELOGBASE);
fprintf(outfile, "\n");

fprintf(outfile, "10^0 ");
for(i=0; i<(EDIGITS*ELOGBASE); i++)
{
int decdigit;

if(i>0)
{
if(((i)%(50*50)) == 0)
fprintf(outfile, "\n\f");
if(((i)%50) == 0)
fprintf(outfile, "\n10^-%-8i",i);
}

j = i%ELOGBASE;
decdigit = EBASE/10;
while(j>0)
{
j--;
decdigit /= 10;
}

fprintf(outfile, "%01d", (esum[i/ELOGBASE]/decdigit)%10);
if(i==0)
fprintf(outfile, ".");
}

fprintf(outfile, "\n\n Last Term = 1/%i!\n\n", edivisor-1);
fprintf(outfile, " Error Bound = +%i*10^-%i\n", edivisor-2, EDIGITS*ELOGBASE-1);
fprintf(outfile, "\x1a");

fclose(outfile);

sprof_end();

return 0;
}


void propcarry(void)
{
unsigned long carrywork;
int i;

for(i=EDIGITS; i>0 ;i--)
{
carrywork=esum[i];
esum[i]=carrywork%EBASE;
esum[i-1]+=carrywork/EBASE;
}
}


void lastfunc(void)
{
// function exists only to provide a label for sprof_start
}

Ivan Korotkov

2004-09-28, 3:56 am

> I don't think so. Timers are processed in normal Windows code, so there
> is
> no way to figure out what you were doing when the timer went off.
>
> Interrupts are the answer, but you need kernel help to do it.


For a *simple* profiler, debugging API are a better answer.

--
Ivan

e-mail me at: korotkov2 at ztel dot ru

Matt

2004-09-28, 3:56 am

"Robert Wessel" <spamtrap@crayne.org> wrote in message
news:bea2590e.0409221158.17e573c4@posting.google.com...
[...]
> GetCurrentThread will not work. It returns a pseudo-handle that
> always addresses the currently running thread, not a handle specific
> to the thread that called GetCurrentThread. It's actually kind of
> surprising you don't just hang, since the first thing the callback
> will do is suspend the thread it's running on!


Calling DuplicateHandle on GetCurrentThread creates a new handle for the
current thread that is not a pseudohandle. Assuming the code that's polling
is in another thread, I don't see the problem.

-Matt

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2010 codecomments.com