Home > Archive > Compilers > April 2006 > Implementing OO
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| xnooga@gmail.com 2006-03-23, 4:12 am |
| Hi!
I'm writing loosely typed script language called eL with compiler and
embeddable vm. I would like to make it object oriented. Unfortunaltely
I couldn't find any good docs about it and I have a smattering about
it. My question is how to build a simple class mechanism which will
enable users to specify own classes in eL and which will support
external classes (useful for embedding). Thank you in advance.
greetings,
M.G.
[I like the discussion in Wilhelm and Maurer, dense but complete. -John]
| |
| MainStem 2006-03-27, 4:02 am |
| hello my friend i think you have to check out the Java virtual machine
implementation for doing that you have to perform some data structures
that encapsulates all class memebers and methods inside and once the
user refer to that class name in the data structure you retrieve all
its members and initialize its members in memory then the user can
refer to any member inside of that class according to the data
attributes protection if it is public or private or even internal or
portected so you have to look first at Java virtual machine
implementation and try to imitate it my friend.
check the following url and see how they could perform OOP in java
according to their specifications.
http://java.sun.com/docs/books/vmsp...pecTOC.doc.html
| |
| oliverhunt@gmail.com 2006-03-27, 4:02 am |
| When you say class features, do you mean structs with methods,
or are you referring to polymorphism?
I've read a number of papers on the implementation of object
oriented languages but in all honesty i can't remember what any of
them were called (though http://citeseer.ist.psu.edu/ may prove a
good resource). Anyhoo, I'll just describe the implementation of
basic single line inheritance with virtual functions here (hopefully
without making it seem to awful/convoluted).
The first thing you need to realise is that classes/object orientation
is largely syntactic sugar, a class for instance is only a struct, with
a number of methods associated as I'll describe now (I have no idea
what you language looks like or what you implementation language
is, so i'm just assuming standard brace-styled object oriented
language, a la java, and that you are implementing in C, hopefully
you'll be able to see what's going on regardless of you actual
circumstance)
Anyhoo, giving structs methods is simple syntactic sugar. eg.
class Foo {
int bar;
int getBar(){ return bar; }
void setBar(int bar){ this.bar = bar; }
}
( or the equivalent syntax in your language)
is identical to:
struct Foo {
int bar;
}
int Foo_getBar(Foo *this){ return this->bar; }
void Foo_setBar(Foo *this, int bar){ this->bar = bar; }
So assuming we have an instance of Foo called f
f.setBar(12);
becomes
Foo_setBar(f, 12);
Hopefully this equivalence will be fairly obvious.
Now say we wanted polymorphism (eg. virtual functions) eg.
class Shape {
abstract int getArea();
abstract Shape scale(float x, float y);
}
now we get:
struct Shape {
void **vtable;
}
int Shape_getArea(Shape *this){
//assuming the compiler makes getArea the first virtual function
return ((int(*)(Shape*))(this->vtable[0]))(this);
}
Shape* Shape_scale(Shape *this, int x, int y){
return ((Shape*(*)(Shape*, int, int))(this->vtable)[1]))(this, x, y);
}
the vtable is an array of function pointers, pointing to an array
determined on a per class basis. Normally the functions
Shape_scale and Shape_getArea would be inlined directly into the
output code, but i have seperated them to aid clarity.
Now say we had a subclass of Shape, Square:
class Square extends Shape {
int size;
Square(width, height){ ... }
int getArea(){ return size*size; }
}
Now this would become:
struct Square {
Shape super;
int size;
}
with a couple of functions:
int Square_getArea(Square *this){ ... }
Shape *Square_scale(Square *this, int x, int y){ ... }
To be able to use the virtual methods Square has provided we
need to have a vtable:
void **Square_vtable={Square_getArea, Square_scale};
Now whenever a square is allocated you compiler must generate
code to make the Squares vtable (Square::super::vtable) point
to Square_vtable. Eg.
Square s = new Square(2);
becomes something along the lines of:
Square *newSquare=(Square*)malloc(sizeof(Square
));
((void***)newSquare)[0]=Square_vtable;
Square_Constructor(newSquare, 2); //call the square constructor
"((void***)newSquare)[0]=Square_vtable;" is used to set the
vtable because the vtable is always the first field in every class,
so this method of assigning the vtable will work regardless as to
the depth of inheritance.
now when we have the code:
Square sq=new Square(2);
Shape s=sq;
s.getArea();
So we get:
Square *sq = //allocate memory for Square setup vtable and call
constructor
Shape *s=//top options here, but they're both effectively the same --
(Shape*)sq or &(sq->super)
Shape_getArea(s);
now Shape_getArea will use s->vtable[0] as its getArea function,
and since s is pointing to the Square we allocated, s->vtable will
be Square_vtable. So the function that will be called in the end
will be Square_getArea (see the definition of Square_vtable).
Thus the end result will be the correct call.
Now applying these steps will allow you to provide a simple single
inheritance object system, providinng things like multiple inheritance
and interfaces (which are basically a restricted case of MI) make
things a lot more complicated so i won't go into them here. There
are also a number of optimisations you can make which I haven't
gone into for reasons of simplicity.
So I hope I've helped (or at least not scared you off :) ),
Oliver
| |
| Ido.Yehieli@gmail.com 2006-03-27, 4:02 am |
| Do you have/know how to implement something similar C ltructs?
If not, take a look here:
http://www.lysator.liu.se/c/ANSI-C-grammar-l.html
and here:
http://www.lysator.liu.se/c/ANSI-C-grammar-y.html
Once your done with that, you can specify a simple OO mechanism that
uses the features you have in your langauge (I assume you also have
functions & types, if not they are easy to implement) like that (i will
write it in a c like language):
struct myClass {
someType someField;
someOtherType someOtherField;
...
}
myClassMethod1(myClass* self, arg1, arg2, ... , argn){
...
}
You can add inheritance, for instance, by extending your struct-like
structures to be inharitable- that shouldn't be to difficult to get
from C like structs to something that can do this:
struct myNewClass:myClass {
/*the declaration above will automatically add a final field called
super
of type myClass */
someType someNewField;
...
}
myNewClassMethod1(myNewClass* self, arg1, arg2, ... , argn){
...
}
and myClassMethod1 will be defined to accept myClass* as it's first
parameter or any struct who's super is of type myClass.
After you have that going you can hide some of the mechanism from the
user, for example- by using a new keyword "class" that will define
these things (like the prefix of the functions and self as the first
parameter) automatically.
Hope that helped a bit,
Ido Yehieli.
| |
| Dmitry A. Kazakov 2006-03-29, 7:04 pm |
| On 27 Mar 2006 01:24:27 -0500, oliverhunt@gmail.com wrote:
> The first thing you need to realise is that classes/object orientation
> is largely syntactic sugar, a class for instance is only a struct, with
> a number of methods associated
[...]
This presumes single disipatch model. For the case of multiple dispatch
methods aren't associated with instances, they are with tuples of
instances. A more restricted case of MD is when a tuple contains only
instances from the same types hierarchy, that's a multimethod. Typical
examples are dyadic operations and assignment.
I don't agree that it is just sugar. Inheritance brings many new choices:
- class vs. type,
- supertype vs. subtype,
- convariant vs. contravariant,
- method vs. free subroutine,
- polymorhic vs. specific
etc. These choices have to be expressed syntactically, and a compiler have
to handle all that mess.
The internal infrastructure is also not an easy thing. Even in a statically
typed language which has declaration scopes, types (and so classes) could
be dynamically created and destroyed. That would require a complex
mechanics to handle dispatching tables, maybe, upward closures to prevent
instances outliving the scope of its type, etc.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
| |
| oliverhunt@gmail.com 2006-04-03, 4:14 am |
| > This presumes single disipatch model. For the case of multiple dispatch
> methods aren't associated with instances, they are with tuples of
> instances. A more restricted case of MD is when a tuple contains only
> instances from the same types hierarchy, that's a multimethod. Typical
> examples are dyadic operations and assignment.
Yes you're right, however i made the (potentially incorrect)
assumption that they were looking at simple single inheritance, single
dispatch. Give the nature of the question i had thought that a
reasonable assumption :)
> I don't agree that it is just sugar. Inheritance brings many new choices:
Well, most of it is -- certainly the bit that most people associated
with OO, eg. instance.method()
Even virtual calls aren't too problematic (though a hell of a lot more
work)
> - class vs. type,
I actually don't know what you mean by this :)
> - supertype vs. subtype,
You're right this would be a pain for the developer to have to
implement themselves. *However* an ability to do a type comparison
(eg. Java's instanceof) is arguably a bad thing as it incourages
"switch on type" behaviour which is frowned on in some circles. (I
realise there are pros and cons to type comparisons, however my
argument is that it is not *necessary*)
> - convariant vs. contravariant,
And i've never been sure of what those words means -- feel free to
enlighten me :)
> - method vs. free subroutine,
> - polymorhic vs. specific
once again no idea what you mean :(
>
> etc. These choices have to be expressed syntactically, and a compiler have
> to handle all that mess.
No they don't, polymorphic methods for instance can be implemented in
languages without support for OO features, and there's more overhead,
as to be understandable by people you would need to have utility
functions. Type hierarchy lookups can be performed with libraries,
once again not as efficient as a compiler doing it itself -- though
it's likely to just be dumping what would otherwise be library code
directly into the generated code.
As for the other features you talk about (namely the ones i don't know
anything about) I can't legitimately respond. Because I don't know
anything about them -- well i may, but i certainly don't recognise the
names :)
>
> The internal infrastructure is also not an easy thing. Even in a statically
> typed language which has declaration scopes, types (and so classes) could
> be dynamically created and destroyed.
Runtime type creation? Possible but not *required* for OO, in fact
I'm not sure how any non-vm/interpreted language could generate new
types at runtime, as it requires codegen -- so any compiled executable
would have to have access to the compiler.
> That would require a complex
> mechanics to handle dispatching tables, maybe, upward closures to prevent
> instances outliving the scope of its type, etc.
Dispatch tables aren't too much of a problem (unless you're generating
types at runtime) and I don't know what you mean by the remainder of
the sentence.
Apologies for the bits i didn't understand -- these are gaps in my
knowledge rather than any failure on your part to communicate :)
Cheers,
Oliver
| |
| Dmitry A. Kazakov 2006-04-08, 7:03 pm |
| On 3 Apr 2006 01:32:55 -0400, oliverhunt@gmail.com wrote:
> I actually don't know what you mean by this :)
Class is a type built as a closure of a set of types. Clearly, in some
contexts the language should distinguish classes and plain types. And this
has influence on both syntax and semantics, especially for types matching.
>
> You're right this would be a pain for the developer to have to
> implement themselves. *However* an ability to do a type comparison
> (eg. Java's instanceof) is arguably a bad thing as it incourages
> "switch on type" behaviour which is frowned on in some circles. (I
> realise there are pros and cons to type comparisons, however my
> argument is that it is not *necessary*)
Well, I actually didn't mean comparisons, however it is also an issue. I
rather meant declaration of new types which are either sub- or super- or
both -classes/types of other types. This is a types algebra to implement.
It is a completely new language relatively to the "non-OO" kernel. Your
point, as I understood it, could be summarized as follows. The existing
algebra providing solely aggregation (= record and array types) can be
effortlessly exploited for OO. Maybe it historically started this way, but
it is not so now.
> And i've never been sure of what those words means -- feel free to
> enlighten me :)
Whether the parameters of a subprogram are covariant (hence dispatching) or
not. In C++ the first (hidden) parameter is covariant all others and the
result are contravariant. It is a quite strange choice, because there is
nothing special in the first parameter. Nevertheless, they are treated
differently by the compiler, especially when it comes to infix notation of
operators.
> once again no idea what you mean :(
A subroutine can be a method (subject of inheritance) or not. Depending on
that it will be either overloadable or overridable. It can't be both
(better not.)
Both objects and subroutines can be polymorphic or not. The combination
parameter x subroutine determines whether a call will be dispatching and
whether the body will treat nested calls as dispatching.
Especially objects, when polymorphic need to keep the type tag. There is no
way to make all objects polymorphic, because that would exclude small
objects like bits, pointers and the values of type tags itself. So the
language should in some form distinguish polymorphic vs. specific objects.
>
> Runtime type creation? Possible but not *required* for OO, in fact
> I'm not sure how any non-vm/interpreted language could generate new
> types at runtime, as it requires codegen -- so any compiled executable
> would have to have access to the compiler.
Not necessarily. Let the code of all methods be known. Then there would be
no need to recompile it, just to set some hooks. It like declaring an array
with the number of elements determined at run-time.
>
> Dispatch tables aren't too much of a problem (unless you're generating
> types at runtime) and I don't know what you mean by the remainder of
> the sentence.
I mean something like:
void Foo ()
{
class FooException : public Exception {}; // Illegal in C++
throw FooException ();
};
What is the type of the object caught outside Foo?
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
| |
| Henry Spencer 2006-04-17, 10:02 pm |
| Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
>
>Whether the parameters of a subprogram are covariant (hence dispatching) or
>not. In C++ the first (hidden) parameter is covariant all others and the
>result are contravariant. It is a quite strange choice, because there is
>nothing special in the first parameter...
It's actually a very natural choice, provided you wish to dispatch on only
one parameter, for encapsulation or performance (and a big reason for C++'s
success is its performance). That decision means that the dispatch-on
parameter *is* special, inherently different from all the other parameters.
To respond to Oliver's request for enlightenment...
+ Covariance means that parameters or results of a method redefined in a
child type can be more specific than their counterparts in the parent's
method (i.e., their types can be children of the types named by the
parent's method).
+ Contravariance means that if they are different, they have to be *less*
specific.
Why might you want covariance? Well, if the parent's method for, say,
addition takes two parameters of the parent type and yields the parent
type, it would seem natural that the child's method would take two
parameters of the child type and yield the child type: if the parent type
is P and the child type is C, the parent's method has the signature
ADD(P,P):P and the child's has ADD(C,C):C.
Why might you want contravariance? Consider when a method Y can safely be
substituted for a method X: when Y's preconditions are the same or weaker
(so that any call which satisfies X's preconditions will also satisfy Y's)
and Y's postconditions are the same or stronger (so that the results from
Y will always satisfy X's postconditions). If you think of parameter type
compatibility as being part of the preconditions -- declaring parameter P
to be of type T implicitly adds a precondition "P is of type T" -- then a
weaker form of such a precondition must require a *less* specific type.
You can't safely substitute ADD(C,C):C for ADD(P,P):P, because ADD(C,C):C
has more-specific preconditions, not less-specific ones: what happens if
it's called with parameters of the parent type? ADD(P,P):C would be okay:
covariance of *results* is safe, because it strengthens *postconditions*.
But combining covariance of parameters with type safety is really tricky
unless you dispatch on all parameters (so each method sees only the
combination of parameter types it's prepared for), which unfortunately
tends to be slow and incurs other hassles.
>A subroutine can be a method (subject of inheritance) or not. Depending on
>that it will be either overloadable or overridable. It can't be both
>(better not.)
Actually, one solution to the co/contravariance problem relies on being
able to do both: <http://david.stoutamire.com/tr-97-061.ps>. The key
observation is that covariance is to contravariance as overloading is
to overriding. Overriding must be contravariant for type safety, but
overloading lets you provide type-safe covariance.
>...There is no
>way to make all objects polymorphic, because that would exclude small
>objects like bits, pointers and the values of type tags itself.
This can be treated as an optimization problem rather than a matter of
language definition. Small objects can be used by the implementation when
the value in question is known at compile time not to be polymorphic.
This does put more of a burden on the compiler, making this particular
"optimization" crucial to usability; that tempts people to make the
distinction in the language itself, to make it easier for the compiler
(and to force compiler implementers to do it).
[color=darkred]
Note that dynamic loading of separately-compiled code has much the same
effect: you may know the interface of the type, but you inherently don't
know anything else about it until it gets loaded at runtime. Loading one
is pretty much akin to creating a new type at runtime.
--
spsystems.net is temporarily off the air; | Henry Spencer
mail to henry at zoo.utoronto.ca instead. | henry@spsystems.net
| |
| Dmitry A. Kazakov 2006-04-21, 10:04 pm |
| On 17 Apr 2006 23:46:16 -0400, Henry Spencer wrote:
> Dmitry A. Kazakov <mailbox@dmitry-kazakov.de> wrote:
> It's actually a very natural choice, provided you wish to dispatch on only
> one parameter, for encapsulation or performance (and a big reason for C++'s
> success is its performance).
There are better ways to solve the performance problem. One of them is
to prevent re-dispatch.
>
> Actually, one solution to the co/contravariance problem relies on being
> able to do both: <http://david.stoutamire.com/tr-97-061.ps>.
Hmm, I don't see how it solves the problem. Clearly, for each
dispatching method all slots in its dispatching table can be observed
as individual "overloaded" subroutines. When types are statically
known, the result will be equivalent to dispatch. The problem though
is that the types might be unknown. Even the result type. An example
is abstract factory. But it is not what I actually meant, sorry ,if I
used sloppy language. I did prevention of occasional overloading where
an overriding was intended.
BTW, the common claim that immutability is the key to solve LSP
problem is wrong. A trivial analysis shows that substitutability is
potentially violated in in-methods upon generalization. These are
perfectly immutable.
Another wrong claim is that all this has anything to do with
"computerized" ellipses and circles as opposed to the "mathematical"
ones. In fact, LSP is violated by *pure* mathematical
circles. Consider the following theorem:
"forall x,y Real there exist <ellipse>, such that its axis have the lengths
of x and y."
Substitute <ellipse> by <circle> and the above will become untrue. The
reason is that "exist" quantifier is incompatible with specialization
(injective mapping of the domain set.) This is why out-methods are
broken upon specialization. "Forall" isn't better, it obviously breaks
in-methods in generalization. There is a full duality between
them. With multi-methods, which take both in- and out-parameters (the
arguments and the result), either specialization or generalization
breaks LSP. In general, LSP would require bijective mappings of
values, which, of course, would make subtyping useless.
(Mathematics is OK, LSP is not.)
What the paper correctly observes, is that there is in- and
out-inheritance. A further move to refine it, could be conditional
substitutability and partial inheritance.
>
> This can be treated as an optimization problem rather than a matter of
> language definition. Small objects can be used by the implementation when
> the value in question is known at compile time not to be polymorphic.
It could be. Still this knowledge should be reflected in types, because
objects would in effect have different representation. So it is reasonable
to have them of different, though related types.
--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
|
|
|
|
|