Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Re: Interrupt latency
Stargazer wrote:

...
> I measured CPU clocks elapsed between the first assembly instruction
> executed at interrupt's entry point in IDT and beginning of the C code
> of user-defined interrupt handler and the result was a big
> surprise :-) It took about 2500 cycles despite that I have only a
> handful of assembly instructions before a call to user-supplied IRQ
> handler.

A naked IRQ (just EOI and IRET) takes about 200 cycles, if measured in
an unproteced environment (ie: my OS where all code run with PL0).

One big cycle loss is in the PL-transition, especially if hardware
task-switches are in use.

> A little more testing showed that almost all cycles (2300+) were spent
> at access to a global variable (via ds:[] addressing). Nothing that
> accesses stack memory (push, pop, call, mov) makes a noticeable
> difference. Does anybody have an idea why this happens? I test on
> Celeron 2.8G in protected mode set up for flat model with paging
> disabled.

If there is no WBINVD instruction (~2000 cycles) in your code
I can just guess what may happen here

user code runs PL=3...
IRQ          : ->PL=0
user hooks   : PL0->PL3
global access: PL3->PL0->PL3
end of hook  : PL3->PL0
IRET         : PL0->PL3

__
wolfgang



Report this thread to moderator Post Follow-up to this message
Old Post
Wolfgang Kern
03-18-08 12:00 AM


Re: Interrupt latency
On Mar 17, 2:38_am, "Alexei A. Frounze"  <spamt...@crayne.org> wrote:
> On Mar 16, 4:02 pm, Stargazer _<spamt...@crayne.org> wrote:
>
>
>
>
> 
> 
> 
> 
> 
> 
>
> What are the min, max and average cycle counts (you need to repeat the
> measurement many times)?
> What are the numbers on other PCs?

A weird thing is that the difference between min and max is about 10
cycles. That is, results are fairly accurate and consistent.
I didn't test on other PCs yet.

> I wonder if it's SMIs. On my Dell Latitude D610 notebook an SMI (or a
> short burst of thereof) may take up to ~240K cycles, which is ~150
> microseconds at 1.6 GHz; on the old Compaq Armada 7800 notebook it's
> only 12K cycles or ~40 microseconds at 300 MHz. hardware bugfixes and
> control are moving into the CPU. :(

I don't know. It's a single instruction that accounts for over 2000
cycles, I can point the instruction but don't understand the reason.
It's a read-modify-write (INC ds:[xxx]) and it has to do something
with the nature of instruction being RMW. Actually I have a BT ds:
[xxx] (read) several instructions before it, which doesn't cause
anything abnormal.

It sounds weird if SMI would somehow be triggered on each and any
hardware interrupt.


D


Report this thread to moderator Post Follow-up to this message
Old Post
Stargazer
03-18-08 12:00 AM


Re: Interrupt latency
Stargazer wrote:

...
> I measured CPU clocks elapsed between the first assembly instruction
> executed at interrupt's entry point in IDT and beginning of the C code
> of user-defined interrupt handler and the result was a big
> surprise :-) It took about 2500 cycles despite that I have only a
> handful of assembly instructions before a call to user-supplied IRQ
> handler.

A naked IRQ (just EOI and IRET) takes about 200 cycles, if measured in
an unproteced environment (ie: my OS where all code run with PL0).

One big cycle loss is in the PL-transition, especially if hardware
task-switches are in use.

> A little more testing showed that almost all cycles (2300+) were spent
> at access to a global variable (via ds:[] addressing). Nothing that
> accesses stack memory (push, pop, call, mov) makes a noticeable
> difference. Does anybody have an idea why this happens? I test on
> Celeron 2.8G in protected mode set up for flat model with paging
> disabled.

If there is no WBINVD instruction (~2000 cycles) in your code
I can just guess what may happen here

user code runs PL=3...
IRQ          : ->PL=0
user hooks   : PL0->PL3
global access: PL3->PL0->PL3
end of hook  : PL3->PL0
IRET         : PL0->PL3

__
wolfgang



Report this thread to moderator Post Follow-up to this message
Old Post
Wolfgang Kern
03-18-08 12:00 AM


Re: Interrupt latency
On Mar 17, 6:00_am, Cyril Novikov  <spamt...@crayne.org> wrote:
> Stargazer wrote: 
> 
> 
> 
>
> That's unlikely. 2300 cycles is about 1us, no memory is that slow. That
> looks more like ISA bus access time. Do you have any I/O instructions or
> MMIO accesses there, by any chance? How do you measure cycles, did you
> remember to use synchronizing instructions with RDTSC?

I use RTDSC twice and display difference between them. I didn't
serialize RDTSC, but what inaccuracy that would cause? I would accept
that it explains even the +/- 10 cycles in measurements, but not 2000.
 
>
> That would be helpful.- Hide quoted text -

The code is below. Only the relevant snippet is posted.
I hid most of irrelevant code (e.g. there is some logic following the
code that loops through possible maximum of 16 chained interrupts, but
I currently don't chain anything). "TEST_INT_RESPONSE" is defined
above the code and "_int_received" is a public label that marks two
doublewords. Comments "; Ts from here..." are put where I inserted the
three-instruction sequence (rdtsc and 2 moves) and the TS difference
that I got until rdtsc in inline assembly in the C code. It appears
that the "inc dword [_running_irq]" accounts for all the mess.

BTW, now I notice that there is not only read through ds:[] (bt), but
also two writes to ds:[] following the rtdsc that I use to store time
stamp for further comparison and neither causes any anomaly. So it
just strengthens my suspect that RMW instruction has some strange
effect on more distant caches. Does anybody have an idea?

Thanks,
D


------------ start of code -----------------
handle_int:
push	eax
push	ebx

%IFDEF	TEST_INT_RESPONSE
;
;	Test latency for interrupt.
;	This code measures latency without itself. With the measurement code
the latency is slightly higher
;
push	edx
rdtsc
mov	[_int_received], eax
mov	[_int_received+4], edx

; Dummy instructions that take about the same time as instructions
above
push	10h
jmp	dummy1
dummy1:
push	eax
push	ebx

; Enough, go ahead
add	esp, 12
pop	edx
;
;	End of test latency for interrupt
;
%ENDIF

mov	ebx, [esp+8]

;
; Acknowledge interupt with PIC(s).
; After calling callbacks, to enable nesting of interrupt handlers
according to priorities
; TODO: check rotation priorities of 8259 and other options
;
mov	al, CMD_EOI
out	PIC_MASTER, al
cmp	ebx, 8
jb	fin
out	PIC_SLAVE, al
fin:

;
; Ts from here to int handler is 2340-2350 cycles
;

bt	[_int_callbacks], ebx
jnc	return_int

;
; Ts from here to int handler is 2340-2350 cycles
;

; On stack saved already: EAX, EBX
; Don't trust C code, save all regs (except for EBP, ESP)
push	ecx
push	edx
push	esi
push	edi

;
; Ts from here to int handler is ~2340 (cycles)
;

%IF 0
push	ebx			; int_no
call	_call_int_callbacks
add	esp, 4
%ELSE
;
; Implement _call_int_callbacks in assembly
;

;;;;; Just to test where cycles go
;	push	eax
;	push	edx
;	rdtsc
;	mov	[_int_received], eax
;	mov	[_int_received+4], edx
;	pop	edx
;	pop	eax
;;;;;

inc	dword [_running_irq]

;
; Ts from here to int handler is 105-115 (cycles)
;
mov	eax, 10							; MAX_IRQ_HANDLERS
mov	ecx, eax

;
; Ts from here to int handler is 105-115 (cycles)
;

mul	ebx
lea	eax, [eax*4]

;
; Ts from here to int handler is 90-105 (cycles)
;

next_int_handler:
push	eax

;
; Ts from here to int handler is 90-105 (cycles)
;

call	dword [_isr_proc+eax]

[...]
%ENDIF
------------ end of code --------------


Report this thread to moderator Post Follow-up to this message
Old Post
Stargazer
03-18-08 12:00 AM


Re: Interrupt latency
In article <7366de24-bc1c-47f1-9fc1-6de7d66d6e5a@s37g2000prg.googlegroups.co
m>,
Stargazer   <spamtrap@crayne.org> wrote:
>Hi,
>
>I am writing my own real-time kernel for x86. Now I face something
>really strange (or may be rather it's not; it has been some time since
>I was in the details of x86 microarchitecture).
>
>I measured CPU clocks elapsed between the first assembly instruction
>executed at interrupt's entry point in IDT and beginning of the C code
>of user-defined interrupt handler and the result was a big
>surprise :-) It took about 2500 cycles despite that I have only a
>handful of assembly instructions before a call to user-supplied IRQ
>handler.
>
>A little more testing showed that almost all cycles (2300+) were spent
>at access to a global variable (via ds:[] addressing).

Is this a SINGLE memory access or several/many accesses?

>Nothing that
>accesses stack memory (push, pop, call, mov) makes a noticeable
>difference. Does anybody have an idea why this happens? I test on
>Celeron 2.8G in protected mode set up for flat model with paging
>disabled.
>
>I can post the code of the interrupt's entry point (until the C entry
>point is called), but it's rather trivial and not long.

Please feel free to post the code so there doesn't have to be any
unnecessary speculation.

Patrick


Report this thread to moderator Post Follow-up to this message
Old Post
(Patrick Klos)
03-18-08 12:00 AM


Re: Interrupt latency
On Mar 16, 8:00_pm, Cyril Novikov  <spamt...@crayne.org> wrote:
> That's unlikely. 2300 cycles is about 1us, no memory is that slow. That
> looks more like ISA bus access time. Do you have any I/O instructions or
> MMIO accesses there, by any chance? How do you measure cycles, did you
> remember to use synchronizing instructions with RDTSC?

Actually, PCI reads tend to take about 1us, and they are generally
implemented as memory reads.

Are you reading an interrupt status register to check for the
interrupt condition, and that's the memory read that eats a us?

What is this memory location?


Report this thread to moderator Post Follow-up to this message
Old Post
tbroberg_nospam@hifn.com
03-18-08 12:00 AM


Re: Interrupt latency
On Mar 17, 8:44_pm, Cyril Novikov  <spamt...@crayne.org> wrote:
> Stargazer wrote: 
>
> These 2 OUT's are what takes 2000+ cycles. The reason you see it elsewhere
 is
> most likely a bug in your measurement code.

Why would it take 2000+ cycles? interrupt controller is integrated in
a chipset which provides pretty fast access. Anyway, your assertion is
not true: if I just comment out the "inc dword [_running_irq]", I get
total latency of about 300 cycles.


D


Report this thread to moderator Post Follow-up to this message
Old Post
Stargazer
03-18-08 11:59 PM


Re: Interrupt latency
On Mar 17, 10:06_pm, "Marven Lee"  <spamt...@crayne.org> wrote:
> Stargazer wrote: 
>
> Is the Celeron 2.8G related to the Pentium 4? _I'm not clued up on
> processors of the last few years.

Yes, it's Celeron from Pentium-4 family. However, I don't include
processor's interrupt entry and IRET into timing. For now I want my
interrupt entry code to introduce minimal additional latency, not deal
with matters that I can't change anyway :-)

As I said, I know where the problem lies, but I don't understand the
reason so I can't yet eliminate it.


D


Report this thread to moderator Post Follow-up to this message
Old Post
Stargazer
03-18-08 11:59 PM


Re: Interrupt latency
On Mar 17, 12:12_pm, "Wolfgang Kern"  <spamt...@crayne.org> wrote:
> Stargazer wrote:
>
> ...
> 
>
> A naked IRQ (just EOI and IRET) takes about 200 cycles, if measured in
> an unproteced environment (ie: my OS where all code run with PL0).
>
> One big cycle loss is in the PL-transition, especially if hardware
> task-switches are in use.

I use "classical" monolithic embedded OS model: both OS and
application run on PL0, no PL switches, no address spaces (CR0.PG=0)
and TSS are not used.

200-300 cycles is what I get without that "inc" instruction.


D.


Report this thread to moderator Post Follow-up to this message
Old Post
Stargazer
03-18-08 11:59 PM


Re: Interrupt latency
Cyril Novikov answered Stargazer:

removed comp.arch.embedded (can't access it from here)
 
 

> These 2 OUT's are what takes 2000+ cycles.
> The reason you see it elsewhere is most likely a bug in
> your measurement code.

I cannot confirm this, except if I/O-permission is heavy detouring.
Else it would mean a very slow chipset connected to a high speed BUS.
Last time I checked I got ~40nS for legacy port access on PCI-bridges.
So it should be not more than additional 8 cycles on a 200 MHz bus.

to Stargazer:
what device causes the IRQ ?
an EOI only sent to the PIC will not reset the IRQ-pin on the device.
If the PIT-channel is set to level sense, you may invoke more IRQs
with EOI for the whole duration of an IRQ-pulse ... may be endless
on other devices than timer, rtcl or retrace.

__
wolfgang



Report this thread to moderator Post Follow-up to this message
Old Post
Wolfgang Kern
03-18-08 11:59 PM


Sponsored Links




Last Thread Next Thread Next
Pages (3): « 1 [2] 3 »
Search this forum -> 
Post New Thread

A86 Assembler archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 02:33 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.