Home > Archive > Compilers > March 2004 > Re: Operation Latencies
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Re: Operation Latencies
|
|
| Michael Tiomkin 2004-03-27, 12:28 am |
| kernel_learner@yahoo.com (Learner) wrote in message news:<04-02-090@comp.compilers>...
> Many operations have variable latencies...so I was wondering how
> compilers do scheduling for VLIW and how they calculate the critical
> path of a data dependence graph when the latencies of operations are
> not known at compile time.. Some scheduling like s/w pipelining
> requires calculation of RecMII which needs to know the critical path
> in the ddg... How can you be sure you have the critical path when the
> latencies can vary?
Notice that most scheduling problems are at least NP hard
(e.g. basic block scheduling with constant latencies, loop scheduling
is related to the stopping problem, it's EXP-space hard for most
languages), and heuristics are usually used for solving them. One of
these heuristics is assuming constant latencies, e.g. Level 1 cache
hit for the loads.
BTW, you can have many critical paths in your DDG.
What is usually important is the max distance from the start/to the end
of the DDG etc.. If your VLIW isn't infinitely wide, your scheduling
wouldn't always be perfect, and the critical paths length is only a rough
lower bound for performance anyway.
> What are the usual approaches ?
First, the optimistic one, assuming most frequently appearing latencies.
Another possibility is to use a more detailed processor model and
compute better estimations of the latencies, at least for the top-down
scheduling of a DDG. It might need recomputation of the distances,
but recall that you might need it anyway when your scheduler sees
that it's not minimal length anymore.
| |
| Tommy Hoffner 2004-03-27, 12:28 am |
| Learner wrote:
> Many operations have variable latencies...so I was wondering how
> compilers do scheduling for VLIW and how they calculate the critical
> path of a data dependence graph when the latencies of operations are
> not known at compile time.. Some scheduling like s/w pipelining
> requires calculation of RecMII which needs to know the critical path
> in the ddg... How can you be sure you have the critical path when the
> latencies can vary?
As mentioned by the other answers, you can't.
> What are the usual approaches ?
I've basically sen two:
1. State the minumum latency for each instruction in the compiler. This
will keep complexity down. In a modern RISC/VLIW machine there is too
many dynamic properties anyway. If the machine have out-of-order
execution you can't really define the actual latency penalty anyway
since parts of it might be covered by the out-of-order mechanism, or you
might be bound by TLB misses or full store queues etc, etc.
2. Assign a latency to each occurence of each instruction before feeding
the code to the scheduler. This would make it possible to state longer
latencies for first access to a cache line and shorter latencies for
repeated accesses etc. I belive there would be quite a lot of tw ing
before you get something useful. Assigning different latencies for one
instruction in a loop (possibly different for different iterations :-).
If you are specifically concerned about pipelining loops you might set
latencies specifially for the involved instructions and use default (eg
lower bound) values for the rest.
Assuming upper bounds, as stated in another response might be a really,
really bad idea if you have a compiler that schedules before register
allocation, since this would increase you register pressure, increase
spill .... (Increasing the latency values in gcc would give this effect,
and even introduce spill in loops :-(
And covering 100+ cycles for every memory access is no fun anyway :-)
If you schedule you instructions after register allocation you could, at
least in theory, assume that memory latencies ar unbounded, and let the
scheduler move things appart as far as it can. At least you shouldn't be
worse of than you were by only scheduling after register allocation.
For calculating critical paths I would say that assuming lower bounds is
good enough, at least if you're working on a highly dynamic machine.
If you increase the precision of youre model of the machine in some
respect (in this case memory latency) you would have to do it on other
details to the same level. If you don't, you can be fairly certain that
those details you don't model will come back and haunt you (by making a
fairly large number of the possible cases actually run slower:-)
/Tommy
|
|
|
|
|