For Programmers: Free Programming Magazines  


Home > Archive > Compilers > August 2004 > Implementing set theory operations similar to SQL's select and update









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Implementing set theory operations similar to SQL's select and update
Barry Kelly

2004-08-23, 4:02 pm

I'm writing an engine designed to implement behaviour which may be
customized along multiple dimensions.

This might seem a bit off-topic, but in my defense I say:

1. My problem is, in essence, implementing a set-based language,
although written through a GUI and probably using predicate expression
evaluation trees rather than machine code for implementation (although
that may be very slow, and that's what this post is about).

2. comp.db.theory isn't moderated and has a low signal/noise ratio, is
mostly about normalization and use of databases, rather than
specifically implementing the mechanics of (say) SQL.

The Problem
-----------
(For those interested in why this is the problem, the background
overview below has more information).

I have a sequential list of multiple (1000+), overlapping set
definitions. I want to know, quickly enough to be usable for a
developer using a GUI front end, does the set they are defining
completely encompass a previously defined set? Is it only a partial
subset? What is that subset? Can that subset be converted easily into
its own representation rather than as the product of set difference
and/or intersection?

If that seems a little vague, the details in the overview below flesh
it out a bit more. The format of the set definitions looks like (dimX,
dimY, dimZ, ...) where each dimN is either an explicit set "{a, b, c}"
or an 'any' rule "*".

Does anybody know of concrete papers, books, articles dealing with
concrete implementation details and algorithms that I should be
tracking down rather than the (thus far unsuccessful) general searches
for "set theory"?

A Brief Background Overview
---------------------------
Example: given a list of business requirements such as:

1: Field f_A has validation v_X for all applications.
2: Field f_A has override validation v_Y in application app_1 for
supervisors.
3: Field f_A has override validation v_Z in application app_1 in state
TX.

Object-oriented techniques have been rejected because they constrain
me to using a fixed order of dimensions for reuse and overriding. A
possible solution presents itself:

1. Creating a notional Cartesian-product of all dimensions of
customization.

2. Overriding based on the application of rules whose scope is defined
by sets described by a simple language: by example, '{a,b}' represents
the set containing a and b, while '*' represents the set containing
all the values for that dimension.

Using the business requirements above:

The dimensions: (app, field, supervisor: bool, state)
The value: the validation rule

The requirements:

1: (*, f_A, *, *) => v_X
2: (app_0, f_A, true, *) => v_Y
3: (app_0, f_A, *, TX) => v_Z

The table generated from these dimensions:

Location | Rule
-------------------|------
app_0 f_A true MS | v_Y
app_0 f_A true TX | v_Z
app_0 f_A false MS | v_X
app_0 f_A false TX | v_Z
// other applications all use v_X

Essentially, when an application needs to know the rule it needs to
apply to a particular field, it works out its dimension in
multi-dimensional space, in the form of a tuple which corresponds to
the location column: (app_0, f_A, true, TX) for example.

This seems to lead to combinatorial explosion, but that is obviated by
the fact that a subset of this *notional* table will be selected on
the basis of version, state, etc., and thence precalculated and
stored. It will thus grow no faster or worse than any 'normal'
application where each part is specified by hand or limited
object-oriented inheritance. The list of requirements which defines
how the notional table is generated forms the real backing store, and
it grows linearly. However, the interaction within this list, or stack
of rule applications, needs to be calculated with at least some level
of efficiency.

That's where set theory steps in, hopefully.

P.S.: The above model is simplified. Actually, there are more than
just validation rules in the table, and rather than storing rules, an
ordered list of rules are stored. Each requirement needs to specify
whether it overrides the previous requirement, comes before it,
inserts in the middle, etc. That's all easy enough once the set of
applications is worked out.

-- Barry Kelly
George Neuner

2004-08-25, 3:59 pm

On 23 Aug 2004 12:07:59 -0400, barry_j_kelly@hotmail.com (Barry Kelly)
wrote:

>I'm writing an engine designed to implement behaviour which may be
>customized along multiple dimensions.
>
>1. My problem is, in essence, implementing a set-based language,
>although written through a GUI and probably using predicate expression
>evaluation trees rather than machine code for implementation (although
>that may be very slow, and that's what this post is about).
>
>I have a sequential list of multiple (1000+), overlapping set
>definitions. I want to know, quickly enough to be usable for a
>developer using a GUI front end, does the set they are defining
>completely encompass a previously defined set? Is it only a partial
>subset? What is that subset? Can that subset be converted easily into
>its own representation rather than as the product of set difference
>and/or intersection?
>
>Does anybody know of concrete papers, books, articles dealing with
>concrete implementation details and algorithms that I should be
>tracking down rather than the (thus far unsuccessful) general searches
>for "set theory"?
>


It's not clear from your problem description whether there is an
actual database or whether the rules define abstract sets. It really
doesn't matter - in order to evaluate the sets for equality you need
to have some kind of concrete representation for them.

The rulebase will be easier to evaluate if each rule stands alone as a
logical predicate. If this is not true for the human readable
representation (very likely 8-), you should internally reduce rule
chains to a single rule that describes the result directly. If
partial chain application is allowed, you should produce a separate
direct rule for each possible partial chain. In application, you
identify and apply the most specific rule.

Regarding the coverage problem - you will definitely want to avoid
creating a cartesian product on scads of dimensions ... instead you'll
want to order evaluation of the predicates in such a way as to produce
the smallest possible result set at each step. If each rule stands
alone, the evaluation order will not affect the result. Identify and
compute intersections before unions or joins to keep the intermediate
results as small as possible.

Using the database analogy, you don't need to actually create the
intermediate table at each step ... you only need to compute the
columns and estimate the number of rows (ie. a virtual table). At
each step, you want to identify and perform the operation that will
produce the table with the least number of result rows. You continue
until either a single table remains or all the remaining intermediate
tables are empty (a null query). The final table, when filled in with
data, is your optimized query.

For practical matters, you should look for papers on database query
optimization, solving equation and logic predicate systems, etc. You
should also read up on dynamic programming techniques if you aren't
familiar with them. I'm sorry I don't have any good cites to give you
- my work has taken me in other directions and I don't follow research
in these fields very closely any more.

AFA implementation - these kinds of problems are best solved in logic
or functional languages (or in languages that easily accomodate those
techniques). I personally like Lisp, but ML, Prolog or one of their
close variants would also work well. This is not the kind of problem
I would like to tackle in C++, Java or the language du jour.

George
Donald F. McLean

2004-08-25, 3:59 pm

Barry Kelly wrote:
> I'm writing an engine designed to implement behaviour which may be
> customized along multiple dimensions.
>
> This might seem a bit off-topic, but in my defense I say:


I don't think that it's off topic.

> The Problem
> -----------
> (For those interested in why this is the problem, the background
> overview below has more information).
>
> I have a sequential list of multiple (1000+), overlapping set
> definitions. I want to know, quickly enough to be usable for a
> developer using a GUI front end, does the set they are defining
> completely encompass a previously defined set? Is it only a partial
> subset? What is that subset? Can that subset be converted easily into
> its own representation rather than as the product of set difference
> and/or intersection?


This high level description of the problem makes perfect sense. It looks
like an interesting problem.

Unfortunately, I can't make heads or tails of your detailed description.

It seems to me that one possible approach would be to think of each set
as a group of contiguous regions in the n-dimensional space. It should
be pretty easy to pick out obvious disjoints.

The tricky part is comparing two regions that are in roughly the same
place. I think that one possible approach would be to stick to the
surfaces of the objects being compared. If you have a surface that
defines the outside of region A and another surface that defines the
outside of region B, find the places where the surfaces interesect. It's
pretty easy, then, to determine the parts of A that are not in B and the
parts of B that are not in A.

I'm not sure if I'm making any sense or not. Hopefully it will at least
generate a useable idea.

> Object-oriented techniques have been rejected because they constrain
> me to using a fixed order of dimensions for reuse and overriding.


There are many ways to implement flexible multi-dimensional processing -
ESPECIALLY in an object oriented language.

On the other hand, not using an object oriented language because you're
not familiar enough with the paradigm to comfortably implement mission
critical software is a perfectly valid rational.

There's no shame in sticking to what works for you, but please don't
make unsupportable claims about the shortcommings of OO.

Donald McLean
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com