Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Data structures for efficient overload resolution
I'm interested in data structures for efficient overload resolution.

If there was only one parameter to a function (and assuming single
inheritance) it would be easy to build an index for efficient overload
resolution. A tree structured the same as the heirarchy of the classes
would handle it. I imagine that better balanced tree structures are
possible using partial-ordering comparisons (is these two classes
super-and-subclass, sub-and-superclass, equal, or other), at least for
single inheritance cases. I'm certainly interested in reading any good
papers you can point me to on this kind of thing, but this is only a
very simplified case of the real problem.

What I'm really interested in is the case with multiple independent
parameters. I have very little idea how to even approach this problem.
The only guess I can make is along the lines of treating each
parameter as a separate dimension and using a spacial structure such
as R-trees, quadtrees or whatever, but of course normal spacial
indexes require that the keys are fully ordered along each dimension.

I have done overload resolution stuff in the recent past, for a tool I
wrote which is very similar to Rhys Weatherlys treecc - ie a code
generator for supporting multiple dispatch with a single inheritance
class heirarchy. However, the code I wrote for the overload resolution
is extremely naive. It evaluates all possible cases, then uses a rule
induction system to try to derive an efficient dispatch handler. Yes,
the rule induction bit is probably overkill, but I did it anyway. It
does minimise the average number of parameters explicitly tested in
the dispatch handler, so it's not totally pointless.

Getting back to the point, because it evaluates every possible
combination, it's vulnerable to combinatorial explosion issues. Not
good, but then the overload resolution needed to take priority rules
and constraint conditions into account as well as the basic
inheritance rules. Overall, I decided I don't care that much, and its
just something I wrote for my own use anyway.

But I now have a different problem with a very similar structure. It's
not overload resolution in the normal compiler sense, but it has the
same basic form. No arbitrary constraint conditions in this case, but
resolution does need to take into account (effectively) several
parameters with a single-inheritance type structure. Not sure whether
I need to resolve ambiguities with a priority system - probably not,
though.

So far, it's a simpler problem that the one I've already solved. The
trouble is, I need to cope with changes to the class heirarchies and
the set of overloads in real time. Hence I cannot precalculate
everything to build a simple decision tree. I need a more dynamic data
structure.

The actual application involves abstracting input methods so they can
be easily reconfigured at run-time. The idea is to have a heirarchy of
target processes and modes, and a heirarchy of physical/abstract input
source types, with each event mapping taking a form analogous to a
single function overload. If it sounds like I'm making things too
complex, that's probably true, but the investigation is more out of
curiosity than practical problem solving ;-)

So, can anyone point me to some interesting papers along these lines?

Thanks in anticipation.

Report this thread to moderator Post Follow-up to this message
Old Post
Steve Horne
03-14-08 01:03 AM


Re: Data structures for efficient overload resolution
Steve Horne wrote:

> I'm interested in data structures for efficient overload resolution.

> What I'm really interested in is the case with multiple independent
> parameters. I have very little idea how to even approach this problem.

This may or may not be useful to your specific case, in so far as it
may give another direction to look for ideas:

I look at overload resolution as a sorting problem where the actual
argument types provide the sorting key for a partial order. Any given
argument type needs to be able to compare two parameter types, and
return one of (<, >, =, <> (incompatible with either)), or possibly more
nuanced for narrowing vs. widening etc., assuming the sorting function
normalises this. There may be more argument types than there are
parameter types (e.g. the 'nil' or 'null' type, in such languages that
support it, usually cannot be a parameter type).

However, multiple parameters doesn't (normally) mean a lexicographical
ordering. Instead, the sorting function taking the key (composed of the
actual argument types) should return the ordering if only some of the
types in the key indicate an ordering; return incompatible if any
indicate <>; etc. with details governed by the specifics of your
overloading semantics.

Overload resolution is then reduced to creating and maintaining a list
of the "max" candidates, i.e. candidates which compare greater than any
previously seen candidate and equal to one another. If at the end of the
search, there are multiple candidates, then the result is ambiguous; no
candidates, not compatible; a single candidate, job done.

If the overload type ordering is sane, then this process should be
linear in the number of candidates (and arguments). By sane, I mean that
any candidate which compares greater than any previous candidate should
also compare greater than any candidate equal to the previous candidate,
preventing a potential quadratic mini-explosion.

-- Barry

--
http://barrkel.blogspot.com/

Report this thread to moderator Post Follow-up to this message
Old Post
Barry Kelly
03-15-08 12:40 AM


Re: Data structures for efficient overload resolution
On Mar 14, 2:42 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:

> There may be more argument types than there are
> parameter types (e.g. the 'nil' or 'null' type, in such languages that
> support it, usually cannot be a parameter type).

Dealing with this is one of the reasons I wrote my utility rather than
just using treecc. It's not too much of a hassle. I support three
cases. The default is that null parameters trigger a standard error
handler (normally an exception throw). If the parameter is flagged as
optional, null parameters are valid and handled either using a
specific null-handling implementation or by an implementation flagged
as able to handle both null and non-null values.

Strictly, the last two cases are not separate - even a null-value-only
handling implementation may only cover a subset of cases (e.g. due to
other parameters) and the resolver only requires that every case must
be unambiguous - not that they must all handle nulls the same way.


Before continuing, I should say that the basic idea that prompted me
to post earlier is one that I'm dropping as just far too silly to
persue even out of curiosity. Configurable input handling idea can
certainly be considered a multiple-dispatch problem, but I don't need
to change my 'class heirarchies' at run time. I might have multiple
viewports, but for input translation I only care about the type of
viewport, for instance, where all the types will have been decided at
compile time. The input mappings can be changed at run-time, but only
during special configuration-changing operations. That means that
multiple dispatch (as opposed to compile-time overload resolution) is
an even closer match - when the configuration is applied, I can
precalculate a decision tree just as I do for my existing utility.

Even so, I'm still interested in more efficient overload resolution
strategies, so...

> Overload resolution is then reduced to creating and maintaining a list
> of the "max" candidates, i.e. candidates which compare greater than any
> previously seen candidate and equal to one another. If at the end of the
> search, there are multiple candidates, then the result is ambiguous; no
> candidates, not compatible; a single candidate, job done.
>
> If the overload type ordering is sane, then this process should be
> linear in the number of candidates (and arguments). By sane, I mean that
> any candidate which compares greater than any previous candidate should
> also compare greater than any candidate equal to the previous candidate,
> preventing a potential quadratic mini-explosion.

I'm a little  here.

When my naive approach resolves one particular call, it checks every
single candidate. It looks linear at a glance, but it's not. An
apparent ambiguity can arise early in the search only to be resolved
later. For example, given a hierarchy with parent class p and child
class c, consider the following sequence of candidates...

1 (p, c)
2 (c, p)
3 (c, c)

When resolving for the call (c, c), candidates 1 and 2 will both
appear valid with neither being more specific than the other.
Therefore, when assessing candidate 3, it must be compared with both
of these two earlier candidates. Only then can these candidates be
eliminated. Comparisons must be done both with the call being resolved
and with those previously identified candidates that have yet to be
eliminated. Since the number of previously identified candidates can
grow linearly with the number of candidates checked so far, the
overall worst-case complexity is quadratic.

The list of previously identified not-yet-eliminated candidates looks
like your max-candidates list. Is that right? I'm  because I
see quadratic time in something you're describing it as linear unless
I do something insane. Perhaps this means that I've got something
insane locked into my mindset that I can't see past?

In my utility, the explosion issue is not this quadratic thing, but
because I'm evaluating every case (every possible call) prior to
generating a decision table. The number of cases grows exponentially
with the number of parameters, of course, and polynomially with the
number of possible run-time types for each parameter. And even that's
just the detonator - the real bang happens when the rule induction
tries to make sense of all those cases in order to build the decision
tree.


I'm now certain that my visions of adapting an ordered-key data
structure to the job were based on confusing some half-remembered
distinct meanings of the term 'partially ordered', and the idea of
combining that with R-trees is just nonsense. I'm still thinking
through some ideas based on organising the set of candidates in
advance to avoid having to check them all for each call resolved, but
I think I'll make sure the ideas are coherent before discussing
them ;-)

Report this thread to moderator Post Follow-up to this message
Old Post
Steve Horne
03-17-08 09:47 AM


Re: Data structures for efficient overload resolution
Steve Horne wrote:

> On Mar 14, 2:42 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:
 

> For example, given a hierarchy with parent class p and child
> class c, consider the following sequence of candidates...
>
>   1 (p, c)
>   2 (c, p)
>   3 (c, c)
>
> When resolving for the call (c, c), candidates 1 and 2 will both
> appear valid with neither being more specific than the other.

> Therefore, when assessing candidate 3, it must be compared with both
> of these two earlier candidates. Only then can these candidates be
> eliminated.

I don't see the reason for your "therefore". Did you omit it?

This is indeed what I meant about the overload type ordering being sane.

> Comparisons must be done both with the call being resolved
> and with those previously identified candidates that have yet to be
> eliminated.

Candidates 1 & 2 compare equal to one another. By sane, I mean that it
is only necessary to compare candidate 3 with candidate 1, because, by
my sanity rule (see quote above), you don't need to compare against 2.

It may be that your rules don't permit this, though. Can you explain in
more detail your resolution requirements, such as a counter-example,
i.e. some case where you *do* have to compare against 2 - and which in
my scheme, would make your system "insane"? :)

> In my utility, the explosion issue is not this quadratic thing, but
> because I'm evaluating every case (every possible call) prior to
> generating a decision table.

Yes, that sounds expensive.

> The number of cases grows exponentially
> with the number of parameters, of course, and polynomially with the
> number of possible run-time types for each parameter. And even that's
> just the detonator - the real bang happens when the rule induction
> tries to make sense of all those cases in order to build the decision
> tree.

Of course, many exponential and high polynomial algorithms work fine
when n is small. Can you give some numbers - number of types, average
compatibility rates (for any random type, what fraction of the total set
of types is it compatible with, on average), numbers of candidates,
numbers of parameters, cost of testing for compatibility?

Basically, given that some of the problems with exponential explosion
are both setup time and space for the approach you describe, I'm
wondering if there's a compromise possible: a shorter setup time with a
less explosive space, at the cost of doing a little max-finding at the
end.

Candidates can be trivially bucketed by number of parameters, so the
number of candidates is only interesting for any given fixed number of
parameters, so assume the below for each possible number of parameters:

* Let n = number of candidates
* Let m = number of arg types
* Let p = number of parameters

Create a sparse matrix C ("C" for compatible), indexed by arg types *
parameter index, where each element references a set of candidates,
sorted in some cheap total ordering (if intersections are to be computed
via sorting). Maximum size is m*p, but it should be smaller because I
would expect there isn't a candidate for every possible argument type in
every possible parameter position.

Then, for each argument type t, for each candidate c, for each parameter
index i, add c to the set C[t,i] if c's parameter at index i can take
argument type t.

Obviously, there can be no more than n*m*p elements of the sets in total
and n for any given set, but hopefully far less - if there aren't far
less, then there's little point building this structure.

Then, for overload resolution, you only need process the intersection of
each set from each C[t,i], or perhaps simply the smallest set returned
for any of the parameter indexes. If the set size is ever 1, then the
job is over quite early.

Computing the intersection might be more expensive than warranted,
depending on how costly the max-finding overload resolution process is.

This overall should work out to (worst case) O(n*m*p*p) in size, and,
depending on the nature of the candidates and types, a greatly-reduced n
when doing the O(n) max-finding. I'm guessing that the real-world
n*m*p*p should be quite a bit less than m^p, unless I've misunderstood
the nature of your problem - that n is large in your case.

On the lookup side, it should be O(p+f*n), where f is the fraction that
the preprocessing work reduces the final resolution max-finding to.

In other words, I think the above would be madness in a real-world
compiler with e.g. Java-style overloading and typically only a small
handful of candidates that ever need to be considered, but it may make
sense for your case.

-- Barry

--
http://barrkel.blogspot.com/

Report this thread to moderator Post Follow-up to this message
Old Post
Barry Kelly
03-18-08 09:46 AM


Re: Data structures for efficient overload resolution
Steve Horne <stephenhorne100@aol.com> wrote in news:08-03-063@comp.compilers:
> When my naive approach resolves one particular call, it checks every
> single candidate. It looks linear at a glance, but it's not. An
> apparent ambiguity can arise early in the search only to be resolved
> later. For example, given a hierarchy with parent class p and child
> class c, consider the following sequence of candidates...
>
>   1 (p, c)
>   2 (c, p)
>   3 (c, c)
>
> When resolving for the call (c, c), candidates 1 and 2 will both
> appear valid with neither being more specific than the other.
> Therefore, when assessing candidate 3, it must be compared with both
> of these two earlier candidates. Only then can these candidates be
> eliminated. Comparisons must be done both with the call being resolved
> and with those previously identified candidates that have yet to be
> eliminated. Since the number of previously identified candidates can
> grow linearly with the number of candidates checked so far, the
> overall worst-case complexity is quadratic.

The usual way to do this is to compare each function against the list
of best matches for each argument irrespective of which function they
came from.  That way you get O(nm) complexity in the number of
candidates and number of arguments.  So after looking at #2, you'll
have (c, c) as your best match argument types and no function that
matches it.  #3 will then match.

Chris Dodd
cdodd@acm.org

Report this thread to moderator Post Follow-up to this message
Old Post
Chris Dodd
03-18-08 09:46 AM


Re: Data structures for efficient overload resolution
On Mar 17, 2:44 pm, Barry Kelly <barry.j.ke...@gmail.com> wrote:
 
>
> Candidates 1 & 2 compare equal to one another. By sane, I mean that it
> is only necessary to compare candidate 3 with candidate 1, because, by
> my sanity rule (see quote above), you don't need to compare against 2.

Ah - this is the problem. By my rules, 1 & 2 do not compare equal -
they compare is incompatible, since a call with either signature could
not match a candidate with the other.

I think you are considering cases ambiguous that I would consider as a
more specialised case overriding a more general one. Probably your
rule is the norm for most programming languages - it's not something
I've given a thought to for some time. When I wrote my tool, I went
with the rule that any candidate can be considered to override any
other provided that all parameters are at least as specialised, and
that at least one parameter is more specialised, with no incompatible
parameters. It seemed a natural generalisation of the single dispatch
rules to multiple dispatch, I think it's what treecc does, and to be
honest it didn't occur to me that any other rule might be valid.

That is, I can (and do) quite validly have code that includes
candidates for (parent, parent), (parent, child), (child, parent) and
(child, child) for the same function. The (child, child) case takes
precedence over all others. I also have cases with (parent, parent),
(parent, child1) and (child2, parent) - in this case a call for
(child2, child1) is ambiguous, but I allow candidates to be given
priorities so that the ambiguity can be resolved.

An extreme case for using priorities would be a dynamic version of the
overloading rules for arithmetic operators in C etc. You get a pattern
something like...

priority 1 variant add (int8, any) ...;
priority 2 variant add (any, int8) ...;
priority 3 variant add (int16, any) ...;
priority 4 variant add (any, int16) ...;
..

It obviously leaves the 'any' parameter still to be resolved in each
case, but at least the selected implementation knows what type to
return.

> Of course, many exponential and high polynomial algorithms work fine
> when n is small. Can you give some numbers - number of types, average
> compatibility rates (for any random type, what fraction of the total set
> of types is it compatible with, on average), numbers of candidates,
> numbers of parameters, cost of testing for compatibility?

I only wrote it around christmas time, and I haven't formally analysed
it. I'm not likely to, either. I just hit limitations in treecc and
needed to work around them. I only have two real applications, and one
of those is itself. It's generating maybe 15,000 lines (carriage
return count) in total of useful code, 3,000 or so for itself and the
rest for this other utility. I like my code generators to generate
readable code, so it's not all that dense either.

The main application is in the development of another code generation
utility, and it's basic handling walking of the AST using multiple
dispatch as an alternative to visitor-like patterns, plus dynamically-
typed values for expression evaluation. The expression evaluation
supports compiling expressions to C++ code with static type checking
to ensure the generated code will be valid, as well as evaluating
other expressions at compile time.

The number of parameters involved is rarely that large - two or three
normally, I don't recall ever needing four - which is one reason why
the rule induction is a bit daft. All it can do is order the parameter
checks in the decision tree, and there isn't much scope for re-
ordering them anyway. It *can* make a difference with two or three
parameters, but rarely does.

As for numbers of types, the worst case ATM is around 80, though at
least 10 are abstract. Most of what's left are the AST nodes - I'm
minimising my dynamic typing issues by using a variable-width integer
class (the priority example earlier is purely hypothetical). The AST
nodes tend to work OK since only a couple of additional context-
setting parameters are relevant to the dispatch for operations on
these nodes, with few cases for each. The usual pattern is something
like...

function handle_ast_node (sum_node, context)
{
call handle_sum (sum_node->arg1, sum_node->arg2);
}

That is, the dispatch for dynamic typing and for AST node stuff are
kept separate. I considered allowing the dispatch handler to look at
objects pointed to by fields as well as the object itself, but it's a
lot of work for a small gain.

The two real-world uses I have for the utility are itself and this
other code generator, and both build in a few seconds on my P4
machine. Much slower than the before-the-return-key-bounces response
of most code generation utilities these days, but obviously not a real
problem. I have one deliberately perverse test case which takes about
25 seconds.

> Basically, given that some of the problems with exponential explosion
> are both setup time and space for the approach you describe, I'm
> wondering if there's a compromise possible: a shorter setup time with a
> less explosive space, at the cost of doing a little max-finding at the
> end.
>
> Candidates can be trivially bucketed by number of parameters, so the
> number of candidates is only interesting for any given fixed number of
> parameters, so assume the below for each possible number of parameters:

I already evaluate one named function, with a known number of
parameters and a strict most-general-possible signature, at a time.
I'm pretty sure I have the smallest possible buckets.

> In other words, I think the above would be madness in a real-world
> compiler with e.g. Java-style overloading and typically only a small
> handful of candidates that ever need to be considered, but it may make
> sense for your case.

I need to re-read what you said, I think, but I have to agree with the
"it would be madness" comment here. My existing utility is doing mad
things anyway, and could work more efficiently by simply not doing
those mad things.

Report this thread to moderator Post Follow-up to this message
Old Post
Steve Horne
03-19-08 12:26 AM


Re: Data structures for efficient overload resolution
On Mar 18, 12:07 am, Chris Dodd <cd...@acm.org> wrote:

> The usual way to do this is to compare each function against the list
> of best matches for each argument irrespective of which function they
> came from.  That way you get O(nm) complexity in the number of
> candidates and number of arguments.  So after looking at #2, you'll
> have (c, c) as your best match argument types and no function that
> matches it.  #3 will then match.

Interesting. If I'm reading correctly, you're basically saying that
for the candidate to be an unambiguous best match, all parameters must
have the best match types. Otherwise, the candidate is either one of
several ambiguous matches or incompatible.

Based on this, the resolution process wouldn't know the difference
between an unmatched call and an ambiguous call, and it wouldn't be
able to list the valid candidates for ambiguous cases, but it would
handle correct cases correctly and detect all errors. When an error is
found, error reports can be based on an alternative resolution process
- keeping the main resolution process efficient.

I like it.

Report this thread to moderator Post Follow-up to this message
Old Post
Steve Horne
03-19-08 09:38 AM


Re: Data structures for efficient overload resolution
On Tue, 11 Mar 2008 16:35:50 -0700 (PDT), Steve Horne
<stephenhorne100@aol.com> wrote:

>I'm interested in data structures for efficient overload resolution.
> :
>What I'm really interested in is the case with multiple independent
>parameters. I have very little idea how to even approach this problem.
>The only guess I can make is along the lines of treating each
>parameter as a separate dimension and using a spacial structure such
>as R-trees, quadtrees or whatever, but of course normal spacial
>indexes require that the keys are fully ordered along each dimension.

Have you looked at how CLOS (Common Lisp Object System) handles
multimethods?  CLOS builds a discrimination tree which orders the type
tests so as to eliminate as many potentials as possible at each
decision point.  Lisp happens to build the tree dynamically at runtime
but it could just as well be done statically at compile time.

There are a number of open source Lisps (CMUCL, SBCL, CLisp, etc.)
available for code study.

George
--

Report this thread to moderator Post Follow-up to this message
Old Post
George Neuner
03-19-08 09:38 AM


Re: Data structures for efficient overload resolution
Just wanted to say thanks for the replies. I think I know what I'm
doing now, certainly far better than I did when I first posted.


Report this thread to moderator Post Follow-up to this message
Old Post
Steve Horne
03-23-08 09:56 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Compilers archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 09:48 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.