| James Giles 2005-02-03, 3:59 am |
| I've now become rather over-committed with respect to the newsgroup
since I have promised to publicize the present crop of Fortran
committee working documents and to fully describe (again, but with
more detail) my proposal for generic programming. I'll start on
the generic programming feature.
I suppose I should define what I'm talking about. There are several
forms of generic programming, or polymorphism. The simplest, in some
respects, is the Lisp model: any name can be assigned a value of any
type, you can even have a procedure as a value. (I'm talking loosely
here, Lisp has *some* limitations.) This has a downside that no type
errors are caught until you actually try a forbidden operation on a
type. Still, it can be implemented with some efficiency with compile-
on-the-fly creation of instances (see below) of procedures for types
that are actually used.
Another form of polymorphism that is simle in another way is called
ad-hoc polymorphism. This is what F90 introduced into Fortran with
the advent of generic interface blocks. What you have is several
different procedures that do different things with different types
or numbers of arguments and you arrange to reference them all through
a common name. It's probably better if they all do something similar,
or at least related to the common name you choose, but it's not required.
In a statically typed language (such as mostly is true of Fortran)
this is also the most powerful form of polymorphism. Its also a
rather inconvenient one. In a compiled language with static type
rules it yields good efficiency.
Inheritance based programming (which is usually considered the
defining characterstic of Object Oriented Programming) introduces
a limited form of polymorphism. Procedures can be written that
apply to a type (class) or any of its extensions (subclasses).
In most languages you can have either statically bound (and so,
efficient) polymorphic procedures, or you can have dynamically
bound (and less efficient) ones. The statically bound ones can
usually be several different procedures with common name that do
different things to different types, similar to the ad-hoc generics.
An extreme example of dynamically bound polymorphism would be an
unrestricted polymorphic procedure. Here the argument(s) might be
of any type defined (or intrinsic) in the whole program. Aside
from being slow, there's little they can do since no operators
are defined on *all* the data types you have, nor are any procedures
other than another unrestricted polymorphic one. Fortran 2003 has
these. Type inquiry and special forms of SELECT allow you to sometimes
determine enough about an argument's type to call some less rarified
kind of procedure.
The last variety of polymorphic programming I'll mention here (and
the one I'll be proposing in the following articles) is parametric
polymorphism. In this form you write a single procedure that can
be applied to several different types of arguments. The possible
types are statically bound, so a compiled implementation can run
efficiently. The set of different types the procedure can be
applied to are expressed parametrically from within the environment
(other modules, type definitions, etc.).
.....
I will begin six different threads to cover the generic proposal.
All but the first concern more detailed descriptions of a method
to express constraints on the properties of actual arguments of
a procedure. In sum, the ability to express these constraints
provides several capabilities that have been proposed by others
(including members of the committee) that are all similar in
that they all deal with expressing explicitly some additional
requirements on the actual arguments of a procedure besides just
Type, KIND, and Rank (TKR) compatibility. They also provide
the additional mechanism required to declare generic procedures.
The first part describes a feature that simplifies the declaration
of generic procedures (even F95 style ones with a separate procedure
for each type signature) so that specific names and interface
blocks are not required. This will be important later because
it will be quite inconvenient to have to dream-up and use separate
names for specific procedures that are never even written out
separately. In any case, the need for interface blocks for new
generics (rather than F77 legacy code) was always somewhat of
an anomaly. Ordinary module procedures and internal procedures
didn't need interface blocks, why do generics?
Below is a list of subject lines for the articles I plan to post.
The order and titles may change if, while writing, I decide the
subject needs to be presented differently. But I don't think it'll
happen. This is an informal venue though, so it's not a major issue.
Part 1 of Short Steps Toward Generic Programming: The GENERIC keyword
Part 2 of Short Steps Toward Generic Programming: Specification Assertions
Part 3 of Short Steps Toward Generic Programming: Attribute Inquiry
Part 4 of Short Steps Toward Generic Programming: KIND Genericity
Part 5 of Short Steps Toward Generic Programming: TYPE Genericity
Part 6 of Short Steps Toward Generic Programming: GENUS Declarations
This covers generic procedures. It's also possible to consider
a feature to support generic data types. I'll get to those
subsequently if there's interest. One form of polymorphic
data type deserves some consideration: variant derived types
(or unions, or whatever they should be called). Another form
of polymorphic type is already in the language in the form of
parameterized types (both KIND and non-KIND, the latter peculiarly
called LEN). And, of course, inheritance-based types can be, in
a sense, polymorphic.
.....
A word about terminology: I used above the phrase "type signature"
and I will probably say other things that some readers are not
familiar with. I'm writing hurriedly, so I don't think I can
quite anticipate what most of these might be, so bear with me.
The type signature is the number of arguments a procedure has,
as well as the declared properties of each argument and their
order. In addition, in Fortran the dummy argument names are part
of the type signature since procedures can be referenced with out
of order arguments using the dummy names used as designators (the
so-called "keyword" syntax). The relevant properties of an argument
vary with different languages - in Fortran they consist of type, KIND,
and rank (whether the argument is an array, and if so how many
dimensions it has). In part 3, I will be recommending that more
properties be added to what constitutes the type signature of a
procedure.
For non-generic programming, the main use of the type signature is
to describe the requirements on references. Each actual argument's
properties must be consistent with the procedure's declaration of
the corresponding dummy argument. This same requirement exists for
generic procedures except that the dummy arguments are declared in
such a way that there are possibly more than one set of properties
acceptable for each. The type signature is used to determine which
of a set of generic procedures will actually be called. And the
type signature will be used to determine which instance of a given
generic procedure will be used.
In the above, the word "instance" is used. Its use refers to the
common implementation of Fortran as a compiled language and that
a given generic procedure may actually generate several distinct
sequences of executable code that will be selected from based on
the type signature of the actual arguments in the calling reference.
In a compiled implementation, most procedures only generate one
instance.
Note that a single instance of a generic may handle multiple type
signatures. For example, a generic SWAP that exchanges the values
of its two arguments a can use identical code for argument types
INTEGER, REAL, and LOGICAL of default KINDs since those are guaranteed
to be the same storage size and the procedure makes no use of any
other property of the arguments. Different instances of SWAP may
be created for types that have other storage sizes (an instance to
SWAP COMPLEX arguments, for example, would also be the right instance
to SWAP double precision REALs). On the other hand, there should
never be more than one instance for the same type signature.
.....
A note about syntax. I believe that syntax is the most important
aspect of a language's design. A language that has really bad
semantic rules may still be pretty good if the syntax is clear
and even the limitations or foibles of the semantics are readily
understood. A language that has bad syntax has no chance of being
a good language no matter how impeccable it's other properties.
So, that said I think the syntax of a feature should be among the
first things discussed and the last to be settled. Such choices
as keywords and layout should remain flexible so as to best reflect
the actual meaning (which may change as the semantics of the feature
is fine-tuned). It's not something to obsess over, but just something
to reconsider from time to time.
For example: I'm presently leaning toward using the CLASS(*)
declaration form F2003 as the way of declaring the polymorphic
arguments of generics. This is because the meaning within F2003
is exactly what I want: it says that the procedure is polymorphic
with respect to that argument (the argument may be of many types).
Without additional ASSERTion (see part 2) there will be no limit
at all on what type the argument could be. However, the word
"class" suggests the feature is somehow associated with the
hierarchical programming feature (even though the F2003 explicitly
notes that there is no connection). I may want the syntax to be
TYPE(*) instead. (Also, there has been the suggestion that CLASS(*)
must always refer only to dynamic types, not static ones. Another
argument against it.)
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
|