Code Comments
Programming Forum and web based access to our favorite programming groups.I'm interested in data structures for efficient overload resolution. If there was only one parameter to a function (and assuming single inheritance) it would be easy to build an index for efficient overload resolution. A tree structured the same as the heirarchy of the classes would handle it. I imagine that better balanced tree structures are possible using partial-ordering comparisons (is these two classes super-and-subclass, sub-and-superclass, equal, or other), at least for single inheritance cases. I'm certainly interested in reading any good papers you can point me to on this kind of thing, but this is only a very simplified case of the real problem. What I'm really interested in is the case with multiple independent parameters. I have very little idea how to even approach this problem. The only guess I can make is along the lines of treating each parameter as a separate dimension and using a spacial structure such as R-trees, quadtrees or whatever, but of course normal spacial indexes require that the keys are fully ordered along each dimension. I have done overload resolution stuff in the recent past, for a tool I wrote which is very similar to Rhys Weatherlys treecc - ie a code generator for supporting multiple dispatch with a single inheritance class heirarchy. However, the code I wrote for the overload resolution is extremely naive. It evaluates all possible cases, then uses a rule induction system to try to derive an efficient dispatch handler. Yes, the rule induction bit is probably overkill, but I did it anyway. It does minimise the average number of parameters explicitly tested in the dispatch handler, so it's not totally pointless. Getting back to the point, because it evaluates every possible combination, it's vulnerable to combinatorial explosion issues. Not good, but then the overload resolution needed to take priority rules and constraint conditions into account as well as the basic inheritance rules. Overall, I decided I don't care that much, and its just something I wrote for my own use anyway. But I now have a different problem with a very similar structure. It's not overload resolution in the normal compiler sense, but it has the same basic form. No arbitrary constraint conditions in this case, but resolution does need to take into account (effectively) several parameters with a single-inheritance type structure. Not sure whether I need to resolve ambiguities with a priority system - probably not, though. So far, it's a simpler problem that the one I've already solved. The trouble is, I need to cope with changes to the class heirarchies and the set of overloads in real time. Hence I cannot precalculate everything to build a simple decision tree. I need a more dynamic data structure. The actual application involves abstracting input methods so they can be easily reconfigured at run-time. The idea is to have a heirarchy of target processes and modes, and a heirarchy of physical/abstract input source types, with each event mapping taking a form analogous to a single function overload. If it sounds like I'm making things too complex, that's probably true, but the investigation is more out of curiosity than practical problem solving ;-) So, can anyone point me to some interesting papers along these lines? Thanks in anticipation.
Post Follow-up to this messageSteve Horne wrote: > I'm interested in data structures for efficient overload resolution. > What I'm really interested in is the case with multiple independent > parameters. I have very little idea how to even approach this problem. This may or may not be useful to your specific case, in so far as it may give another direction to look for ideas: I look at overload resolution as a sorting problem where the actual argument types provide the sorting key for a partial order. Any given argument type needs to be able to compare two parameter types, and return one of (<, >, =, <> (incompatible with either)), or possibly more nuanced for narrowing vs. widening etc., assuming the sorting function normalises this. There may be more argument types than there are parameter types (e.g. the 'nil' or 'null' type, in such languages that support it, usually cannot be a parameter type). However, multiple parameters doesn't (normally) mean a lexicographical ordering. Instead, the sorting function taking the key (composed of the actual argument types) should return the ordering if only some of the types in the key indicate an ordering; return incompatible if any indicate <>; etc. with details governed by the specifics of your overloading semantics. Overload resolution is then reduced to creating and maintaining a list of the "max" candidates, i.e. candidates which compare greater than any previously seen candidate and equal to one another. If at the end of the search, there are multiple candidates, then the result is ambiguous; no candidates, not compatible; a single candidate, job done. If the overload type ordering is sane, then this process should be linear in the number of candidates (and arguments). By sane, I mean that any candidate which compares greater than any previous candidate should also compare greater than any candidate equal to the previous candidate, preventing a potential quadratic mini-explosion. -- Barry -- http://barrkel.blogspot.com/
Post Follow-up to this messageOn Mar 14, 2:42 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote: > There may be more argument types than there are > parameter types (e.g. the 'nil' or 'null' type, in such languages that > support it, usually cannot be a parameter type). Dealing with this is one of the reasons I wrote my utility rather than just using treecc. It's not too much of a hassle. I support three cases. The default is that null parameters trigger a standard error handler (normally an exception throw). If the parameter is flagged as optional, null parameters are valid and handled either using a specific null-handling implementation or by an implementation flagged as able to handle both null and non-null values. Strictly, the last two cases are not separate - even a null-value-only handling implementation may only cover a subset of cases (e.g. due to other parameters) and the resolver only requires that every case must be unambiguous - not that they must all handle nulls the same way. Before continuing, I should say that the basic idea that prompted me to post earlier is one that I'm dropping as just far too silly to persue even out of curiosity. Configurable input handling idea can certainly be considered a multiple-dispatch problem, but I don't need to change my 'class heirarchies' at run time. I might have multiple viewports, but for input translation I only care about the type of viewport, for instance, where all the types will have been decided at compile time. The input mappings can be changed at run-time, but only during special configuration-changing operations. That means that multiple dispatch (as opposed to compile-time overload resolution) is an even closer match - when the configuration is applied, I can precalculate a decision tree just as I do for my existing utility. Even so, I'm still interested in more efficient overload resolution strategies, so... > Overload resolution is then reduced to creating and maintaining a list > of the "max" candidates, i.e. candidates which compare greater than any > previously seen candidate and equal to one another. If at the end of the > search, there are multiple candidates, then the result is ambiguous; no > candidates, not compatible; a single candidate, job done. > > If the overload type ordering is sane, then this process should be > linear in the number of candidates (and arguments). By sane, I mean that > any candidate which compares greater than any previous candidate should > also compare greater than any candidate equal to the previous candidate, > preventing a potential quadratic mini-explosion. I'm a littlehere. When my naive approach resolves one particular call, it checks every single candidate. It looks linear at a glance, but it's not. An apparent ambiguity can arise early in the search only to be resolved later. For example, given a hierarchy with parent class p and child class c, consider the following sequence of candidates... 1 (p, c) 2 (c, p) 3 (c, c) When resolving for the call (c, c), candidates 1 and 2 will both appear valid with neither being more specific than the other. Therefore, when assessing candidate 3, it must be compared with both of these two earlier candidates. Only then can these candidates be eliminated. Comparisons must be done both with the call being resolved and with those previously identified candidates that have yet to be eliminated. Since the number of previously identified candidates can grow linearly with the number of candidates checked so far, the overall worst-case complexity is quadratic. The list of previously identified not-yet-eliminated candidates looks like your max-candidates list. Is that right? I'm
because I see quadratic time in something you're describing it as linear unless I do something insane. Perhaps this means that I've got something insane locked into my mindset that I can't see past? In my utility, the explosion issue is not this quadratic thing, but because I'm evaluating every case (every possible call) prior to generating a decision table. The number of cases grows exponentially with the number of parameters, of course, and polynomially with the number of possible run-time types for each parameter. And even that's just the detonator - the real bang happens when the rule induction tries to make sense of all those cases in order to build the decision tree. I'm now certain that my visions of adapting an ordered-key data structure to the job were based on confusing some half-remembered distinct meanings of the term 'partially ordered', and the idea of combining that with R-trees is just nonsense. I'm still thinking through some ideas based on organising the set of candidates in advance to avoid having to check them all for each call resolved, but I think I'll make sure the ideas are coherent before discussing them ;-)
Post Follow-up to this messageSteve Horne wrote:
> On Mar 14, 2:42 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:
> For example, given a hierarchy with parent class p and child
> class c, consider the following sequence of candidates...
>
> 1 (p, c)
> 2 (c, p)
> 3 (c, c)
>
> When resolving for the call (c, c), candidates 1 and 2 will both
> appear valid with neither being more specific than the other.
> Therefore, when assessing candidate 3, it must be compared with both
> of these two earlier candidates. Only then can these candidates be
> eliminated.
I don't see the reason for your "therefore". Did you omit it?
This is indeed what I meant about the overload type ordering being sane.
> Comparisons must be done both with the call being resolved
> and with those previously identified candidates that have yet to be
> eliminated.
Candidates 1 & 2 compare equal to one another. By sane, I mean that it
is only necessary to compare candidate 3 with candidate 1, because, by
my sanity rule (see quote above), you don't need to compare against 2.
It may be that your rules don't permit this, though. Can you explain in
more detail your resolution requirements, such as a counter-example,
i.e. some case where you *do* have to compare against 2 - and which in
my scheme, would make your system "insane"? :)
> In my utility, the explosion issue is not this quadratic thing, but
> because I'm evaluating every case (every possible call) prior to
> generating a decision table.
Yes, that sounds expensive.
> The number of cases grows exponentially
> with the number of parameters, of course, and polynomially with the
> number of possible run-time types for each parameter. And even that's
> just the detonator - the real bang happens when the rule induction
> tries to make sense of all those cases in order to build the decision
> tree.
Of course, many exponential and high polynomial algorithms work fine
when n is small. Can you give some numbers - number of types, average
compatibility rates (for any random type, what fraction of the total set
of types is it compatible with, on average), numbers of candidates,
numbers of parameters, cost of testing for compatibility?
Basically, given that some of the problems with exponential explosion
are both setup time and space for the approach you describe, I'm
wondering if there's a compromise possible: a shorter setup time with a
less explosive space, at the cost of doing a little max-finding at the
end.
Candidates can be trivially bucketed by number of parameters, so the
number of candidates is only interesting for any given fixed number of
parameters, so assume the below for each possible number of parameters:
* Let n = number of candidates
* Let m = number of arg types
* Let p = number of parameters
Create a sparse matrix C ("C" for compatible), indexed by arg types *
parameter index, where each element references a set of candidates,
sorted in some cheap total ordering (if intersections are to be computed
via sorting). Maximum size is m*p, but it should be smaller because I
would expect there isn't a candidate for every possible argument type in
every possible parameter position.
Then, for each argument type t, for each candidate c, for each parameter
index i, add c to the set C[t,i] if c's parameter at index i can take
argument type t.
Obviously, there can be no more than n*m*p elements of the sets in total
and n for any given set, but hopefully far less - if there aren't far
less, then there's little point building this structure.
Then, for overload resolution, you only need process the intersection of
each set from each C[t,i], or perhaps simply the smallest set returned
for any of the parameter indexes. If the set size is ever 1, then the
job is over quite early.
Computing the intersection might be more expensive than warranted,
depending on how costly the max-finding overload resolution process is.
This overall should work out to (worst case) O(n*m*p*p) in size, and,
depending on the nature of the candidates and types, a greatly-reduced n
when doing the O(n) max-finding. I'm guessing that the real-world
n*m*p*p should be quite a bit less than m^p, unless I've misunderstood
the nature of your problem - that n is large in your case.
On the lookup side, it should be O(p+f*n), where f is the fraction that
the preprocessing work reduces the final resolution max-finding to.
In other words, I think the above would be madness in a real-world
compiler with e.g. Java-style overloading and typically only a small
handful of candidates that ever need to be considered, but it may make
sense for your case.
-- Barry
--
http://barrkel.blogspot.com/
Post Follow-up to this messageSteve Horne <stephenhorne100@aol.com> wrote in news:08-03-063@comp.compilers: > When my naive approach resolves one particular call, it checks every > single candidate. It looks linear at a glance, but it's not. An > apparent ambiguity can arise early in the search only to be resolved > later. For example, given a hierarchy with parent class p and child > class c, consider the following sequence of candidates... > > 1 (p, c) > 2 (c, p) > 3 (c, c) > > When resolving for the call (c, c), candidates 1 and 2 will both > appear valid with neither being more specific than the other. > Therefore, when assessing candidate 3, it must be compared with both > of these two earlier candidates. Only then can these candidates be > eliminated. Comparisons must be done both with the call being resolved > and with those previously identified candidates that have yet to be > eliminated. Since the number of previously identified candidates can > grow linearly with the number of candidates checked so far, the > overall worst-case complexity is quadratic. The usual way to do this is to compare each function against the list of best matches for each argument irrespective of which function they came from. That way you get O(nm) complexity in the number of candidates and number of arguments. So after looking at #2, you'll have (c, c) as your best match argument types and no function that matches it. #3 will then match. Chris Dodd cdodd@acm.org
Post Follow-up to this messageOn Mar 17, 2:44 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:
>
> Candidates 1 & 2 compare equal to one another. By sane, I mean that it
> is only necessary to compare candidate 3 with candidate 1, because, by
> my sanity rule (see quote above), you don't need to compare against 2.
Ah - this is the problem. By my rules, 1 & 2 do not compare equal -
they compare is incompatible, since a call with either signature could
not match a candidate with the other.
I think you are considering cases ambiguous that I would consider as a
more specialised case overriding a more general one. Probably your
rule is the norm for most programming languages - it's not something
I've given a thought to for some time. When I wrote my tool, I went
with the rule that any candidate can be considered to override any
other provided that all parameters are at least as specialised, and
that at least one parameter is more specialised, with no incompatible
parameters. It seemed a natural generalisation of the single dispatch
rules to multiple dispatch, I think it's what treecc does, and to be
honest it didn't occur to me that any other rule might be valid.
That is, I can (and do) quite validly have code that includes
candidates for (parent, parent), (parent, child), (child, parent) and
(child, child) for the same function. The (child, child) case takes
precedence over all others. I also have cases with (parent, parent),
(parent, child1) and (child2, parent) - in this case a call for
(child2, child1) is ambiguous, but I allow candidates to be given
priorities so that the ambiguity can be resolved.
An extreme case for using priorities would be a dynamic version of the
overloading rules for arithmetic operators in C etc. You get a pattern
something like...
priority 1 variant add (int8, any) ...;
priority 2 variant add (any, int8) ...;
priority 3 variant add (int16, any) ...;
priority 4 variant add (any, int16) ...;
..
It obviously leaves the 'any' parameter still to be resolved in each
case, but at least the selected implementation knows what type to
return.
> Of course, many exponential and high polynomial algorithms work fine
> when n is small. Can you give some numbers - number of types, average
> compatibility rates (for any random type, what fraction of the total set
> of types is it compatible with, on average), numbers of candidates,
> numbers of parameters, cost of testing for compatibility?
I only wrote it around christmas time, and I haven't formally analysed
it. I'm not likely to, either. I just hit limitations in treecc and
needed to work around them. I only have two real applications, and one
of those is itself. It's generating maybe 15,000 lines (carriage
return count) in total of useful code, 3,000 or so for itself and the
rest for this other utility. I like my code generators to generate
readable code, so it's not all that dense either.
The main application is in the development of another code generation
utility, and it's basic handling walking of the AST using multiple
dispatch as an alternative to visitor-like patterns, plus dynamically-
typed values for expression evaluation. The expression evaluation
supports compiling expressions to C++ code with static type checking
to ensure the generated code will be valid, as well as evaluating
other expressions at compile time.
The number of parameters involved is rarely that large - two or three
normally, I don't recall ever needing four - which is one reason why
the rule induction is a bit daft. All it can do is order the parameter
checks in the decision tree, and there isn't much scope for re-
ordering them anyway. It *can* make a difference with two or three
parameters, but rarely does.
As for numbers of types, the worst case ATM is around 80, though at
least 10 are abstract. Most of what's left are the AST nodes - I'm
minimising my dynamic typing issues by using a variable-width integer
class (the priority example earlier is purely hypothetical). The AST
nodes tend to work OK since only a couple of additional context-
setting parameters are relevant to the dispatch for operations on
these nodes, with few cases for each. The usual pattern is something
like...
function handle_ast_node (sum_node, context)
{
call handle_sum (sum_node->arg1, sum_node->arg2);
}
That is, the dispatch for dynamic typing and for AST node stuff are
kept separate. I considered allowing the dispatch handler to look at
objects pointed to by fields as well as the object itself, but it's a
lot of work for a small gain.
The two real-world uses I have for the utility are itself and this
other code generator, and both build in a few seconds on my P4
machine. Much slower than the before-the-return-key-bounces response
of most code generation utilities these days, but obviously not a real
problem. I have one deliberately perverse test case which takes about
25 seconds.
> Basically, given that some of the problems with exponential explosion
> are both setup time and space for the approach you describe, I'm
> wondering if there's a compromise possible: a shorter setup time with a
> less explosive space, at the cost of doing a little max-finding at the
> end.
>
> Candidates can be trivially bucketed by number of parameters, so the
> number of candidates is only interesting for any given fixed number of
> parameters, so assume the below for each possible number of parameters:
I already evaluate one named function, with a known number of
parameters and a strict most-general-possible signature, at a time.
I'm pretty sure I have the smallest possible buckets.
> In other words, I think the above would be madness in a real-world
> compiler with e.g. Java-style overloading and typically only a small
> handful of candidates that ever need to be considered, but it may make
> sense for your case.
I need to re-read what you said, I think, but I have to agree with the
"it would be madness" comment here. My existing utility is doing mad
things anyway, and could work more efficiently by simply not doing
those mad things.
Post Follow-up to this messageOn Mar 18, 12:07 am, Chris Dodd <cd...@acm.org> wrote: > The usual way to do this is to compare each function against the list > of best matches for each argument irrespective of which function they > came from. That way you get O(nm) complexity in the number of > candidates and number of arguments. So after looking at #2, you'll > have (c, c) as your best match argument types and no function that > matches it. #3 will then match. Interesting. If I'm reading correctly, you're basically saying that for the candidate to be an unambiguous best match, all parameters must have the best match types. Otherwise, the candidate is either one of several ambiguous matches or incompatible. Based on this, the resolution process wouldn't know the difference between an unmatched call and an ambiguous call, and it wouldn't be able to list the valid candidates for ambiguous cases, but it would handle correct cases correctly and detect all errors. When an error is found, error reports can be based on an alternative resolution process - keeping the main resolution process efficient. I like it.
Post Follow-up to this messageOn Tue, 11 Mar 2008 16:35:50 -0700 (PDT), Steve Horne <stephenhorne100@aol.com> wrote: >I'm interested in data structures for efficient overload resolution. > : >What I'm really interested in is the case with multiple independent >parameters. I have very little idea how to even approach this problem. >The only guess I can make is along the lines of treating each >parameter as a separate dimension and using a spacial structure such >as R-trees, quadtrees or whatever, but of course normal spacial >indexes require that the keys are fully ordered along each dimension. Have you looked at how CLOS (Common Lisp Object System) handles multimethods? CLOS builds a discrimination tree which orders the type tests so as to eliminate as many potentials as possible at each decision point. Lisp happens to build the tree dynamically at runtime but it could just as well be done statically at compile time. There are a number of open source Lisps (CMUCL, SBCL, CLisp, etc.) available for code study. George --
Post Follow-up to this messageJust wanted to say thanks for the replies. I think I know what I'm doing now, certainly far better than I did when I first posted.
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.