For Programmers: Free Programming Magazines  


Home > Archive > Compilers > October 2006 > HLL design









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author HLL design
fermineutron

2006-10-21, 4:04 am

This is somewhat of a theoretical question for the experts here. Most
of easy to use languages are too slow for any serious computing while
the languages like C and assembly are somewhat of a pain to use. The
key issues that i identyfy as pain include but not limited to,
variable declaration array boundary checking, strict data types
etc. Most can be solved by using extended variables that carry key
information about the data in the variable within the variable, for
example using the 1st byte to designate the data type of a variable,
int, float double etc. using second byte to designate the number of
dimensions in the variable lets say its N, then using N 4-byte
integers to store dimension sizes, and this header is followed by the
data itself.

Now my question is:

Assuming that the system that is to work with such variables is
inteligently designed, do you think it will be significantly slower
than C? Is there a good reason why there is no "efficient" programming
language which uses such or simmilar system to store variables?


Basically i want a programming language which has the speed and
flexibility of C, while user friendlenness of MatLab.


I am thinking about starting a project to write such a language, but
before i imbark of such a great crue i want to hear what people who
are by far more experienced than me have to say about it?

Is it really impossible to design a compiler which translates higher
level code than C into efficient assembly?
Dmitry A. Kazakov

2006-10-21, 7:03 pm

On 21 Oct 2006 02:27:47 -0400, fermineutron wrote:

> This is somewhat of a theoretical question for the experts here. Most
> of easy to use languages are too slow for any serious computing while
> the languages like C and assembly are somewhat of a pain to use. The
> key issues that i identyfy as pain include but not limited to,
> variable declaration array boundary checking, strict data types
> etc. Most can be solved by using extended variables that carry key
> information about the data in the variable within the variable, for
> example using the 1st byte to designate the data type of a variable,
> int, float double etc. using second byte to designate the number of
> dimensions in the variable lets say its N, then using N 4-byte
> integers to store dimension sizes, and this header is followed by the
> data itself.
>
> Now my question is:
>
> Assuming that the system that is to work with such variables is
> inteligently designed, do you think it will be significantly slower
> than C?


There is no reason why it should be slower.

What you describe is the concept constrained objects. The array ranges
is a constraint put on an unconstrained array. The type tag is a
constraint put on a polymorphic object, which determines its specific
type (for example vtab_ptr in C++). When all constraints are known at
compile time, the object is said to be constrained. When some of them
are not, then the object is unconstrained. If you know the constraints
of a set of objects, they form a constrained subtype. Important is
that for constrained subtypes you don't need to keep / query /
evaluate constraints at run-time. Which technically means that you
loose neither space nor performance. What you need is to change the
object representation depending on the constraints. The simplest form
of this is to remove known constraints. More elaborated variants could
go further. The price to pay is loosing by-reference parameter passing
in some cases.

> I am thinking about starting a project to write such a language, but
> before i imbark of such a great crue i want to hear what people who
> are by far more experienced than me have to say about it?


I would recommend you to study Ada design, which deploys this model.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
glen herrmannsfeldt

2006-10-21, 7:03 pm

fermineutron wrote:

> This is somewhat of a theoretical question for the experts here. Most
> of easy to use languages are too slow for any serious computing while
> the languages like C and assembly are somewhat of a pain to use. The
> key issues that i identyfy as pain include but not limited to,
> variable declaration array boundary checking, strict data types
> etc. Most can be solved by using extended variables that carry key
> information about the data in the variable within the variable, for
> example using the 1st byte to designate the data type of a variable,
> int, float double etc. using second byte to designate the number of
> dimensions in the variable lets say its N, then using N 4-byte
> integers to store dimension sizes, and this header is followed by the
> data itself.


This is pretty much the descriptor used by languages that pass arrays
by descriptor. PL/I, assumed shape arrays in Fortran, and probably
others. I don't know that they include the type, but the dimensions
and the address of the actual array.

> Assuming that the system that is to work with such variables is
> inteligently designed, do you think it will be significantly slower
> than C? Is there a good reason why there is no "efficient"
> programming language which uses such or simmilar system to store
> variables?


An array access takes one or two machine instructions per array
dimension on most machines. Add maybe four per dimension to do bounds
checking, so each takes three times longer.

Because of this, the VAX designers added a special instruction to do
array indexing with bounds checking. This instruction turned out to
be slower than doing it the usual way with separate instructions, and
so was rarely used.

Many compilers now have an option for bounds checking, and it is
a good idea to use it.

> Basically i want a programming language which has the speed and
> flexibility of C, while user friendlenness of MatLab.


For an interpreted language like MatLab, Mathematica, R, etc., an
array access takes somewhat longer. The extra time to do bounds
checking, relative to the access itself, isn't so bad, and besides
people expect it.

> I am thinking about starting a project to write such a language, but
> before i imbark of such a great crue i want to hear what people
> who are by far more experienced than me have to say about it?


Well, Java requires bounds checking, and with JIT on isn't so much
slower than compiled code. It is probably possible to make some
language design decisions that will make bounds checking easier.

If a compiler can determine at compile time that, for example, a loop
variable can't go outside the array, or can test the loop conditions
once instead of each time, it could be faster. I believe some
compilers do that, but I don't know that any language includes it in
the language definition.

> Is it really impossible to design a compiler which translates higher
> level code than C into efficient assembly?


Not really, but not doing the check is pretty much always faster
than doing it.

-- glen
Pascal Bourguignon

2006-10-21, 7:03 pm

"fermineutron" <free4trample@yahoo.com> writes:
> Basically i want a programming language which has the speed and
> flexibility of C, while user friendlenness of MatLab. ...


> Is it really impossible to design a compiler which translates higher
> level code than C into efficient assembly?


http://www.lrde.epita.fr/dload/pape...na.06.imecs.pdf
http://www.lrde.epita.fr/~didier/co...earch/publi.php
http://p-cos.net/lisp-ecoop06/pdf/112959.pdf

--
__Pascal Bourguignon__ http://www.informatimago.com/
Wolfram Fenske

2006-10-21, 7:03 pm

"fermineutron" <free4trample@yahoo.com> schreibt:

> This is somewhat of a theoretical question for the experts here. Most
> of easy to use languages are too slow for any serious computing while
> the languages like C and assembly are somewhat of a pain to use. The
> key issues that i identyfy as pain include but not limited to,
> variable declaration array boundary checking, strict data types
> etc. Most can be solved by using extended variables that carry key
> information about the data in the variable within the variable, for
> example using the 1st byte to designate the data type of a variable,
> int, float double etc. using second byte to designate the number of
> dimensions in the variable lets say its N, then using N 4-byte
> integers to store dimension sizes, and this header is followed by the
> data itself.
>
> Now my question is:
>
> Assuming that the system that is to work with such variables is
> inteligently designed, do you think it will be significantly slower
> than C? Is there a good reason why there is no "efficient" programming
> language which uses such or simmilar system to store variables?


About the claim that there was no "efficient" programming language
with these features:

OCaml is very fast. Its creators conservatively claim it's at least
50% as fast as a C++ program compiled by a good optimizing compiler,
but usually it's faster than that, in some occasions even faster than
C++. OCaml is a statically typed language but unlike C or Java it
doesn't require you to declare types because it uses type inference
[1]. It also has bounds checking.

Apart from OCaml, I hear that recent Common Lisp and Scheme native
code compilers (e. g. CMUCL, SBCL, Bigloo, Larceny, Allegro, and
Lispworks) produce very fast code. But to get the best performance
you usually have to declare variable types in critical functions,
which I gather you're trying to avoid.

Note that both OCaml and the Lisps support bignums.

As always, YMMV, because the performance of a particular language
depends greatly on the problem you're trying to solve.

Now about the "good reason" why most languages are slower than C part:

Basically, it's like this: to produce efficient code, a language has
to be pretty static, because if more work can be done statically by
the compiler, less work has to be done at runtime. Statically typed
languages like C or OCaml eliminate all type information during
compilation so they have no overhead whatsoever when reading or
writing variables. If OTOH your variables carry type information at
runtime, they are a) bigger and won't fit into a single machine
register like, e. g., an int in C, b) you usually need to dereference
a pointer, and c) an additional offset is needed to access the actual
value.

Another reason why C is so fast is that it doesn't do any safety
checks at runtime. Everything that could possibly require runtime
support or otherwise slow down the programm is "undefined behavior".
E. g. it has no grabage collector. Or when an integer overflow occurs
in a C program, the value just wraps around. In Lisp, OTOH, the
result would automatically be promoted to a bignum, but checking for
overflows after each arithmetic operation is of course much more
costly than the C solution.

> Basically i want a programming language which has the speed and
> flexibility of C, while user friendlenness of MatLab.


You have to find a compromise between speed and user friendliness.
You can't have it both ways. OCaml, because of it's type inference,
is able to produce very fast executables, and usually you get away
with much less lines of code than an equivalent C solution. In my
opinion, it's going to be very hard to beat OCaml if you want the
features you mentioned. But maybe you can beat MatLab, I don't know
how optimized it is, or whether it uses just in time compilation,
etc.. In one discussion, somebody claimed that he had reimplemented
the core of Mathematica in a couple lines of OCaml and it was faster
than the original. [2] But I don't know if there's any truth to it.

> I am thinking about starting a project to write such a language, but
> before i imbark of such a great crue i want to hear what people who
> are by far more experienced than me have to say about it?
>
> Is it really impossible to design a compiler which translates higher
> level code than C into efficient assembly?


OCaml is much higher level than C, so I'd say it's possible. The
question is how much performance are you willing to sacrifice and how
much time/money/people you have to write and optimize the compiler.


Wolfram

Footnotes:
[1] <http://en.wikipedia.org/wiki/Type_inference>
[2]
<http://groups.google.com/group/comp...a7d24b5?rnum=44>

Hans-Peter Diettrich

2006-10-21, 7:03 pm

fermineutron wrote:

> Assuming that the system that is to work with such variables is
> inteligently designed, do you think it will be significantly slower
> than C? Is there a good reason why there is no "efficient" programming
> language which uses such or simmilar system to store variables?


The amount of extra data doesn't count at runtime, it's the *usage* of
such information. In C++ you can use classes and templates, to make
the objects behave as expected. With debug and retail modes, if you
like...

Sometimes more data can result in faster code, e.g. when the length of
a string is stored explicitly, and must not be determined by searching
for an '\0' string terminator.


> Basically i want a programming language which has the speed and
> flexibility of C, while user friendlenness of MatLab.


It's a matter of compiletime/runtime checks. When a language does not
allow for compiletime checks (of pointer usage...), everything must be
done at runtime. So it's a matter of the design of the objects, which
shall replace the known unsafe data types, and how far the users of
your system (language or class library) accept the resulting
restrictions, overhead in thinking and typing etc.


> Is it really impossible to design a compiler which translates higher
> level code than C into efficient assembly?


The efficiency should not be measured in an atomic view. Global
optimization can be improved by an appropriate formalism in a HLL
(high level language), but hardly in a LLL (low level, assembly,
C...). If you want type safety, bounds checking etc., the according
instructions will bloat assembly code as well. But only HLL code, with
appropriate additional information about data types, subroutine
arguments etc., will allow to determine, where the checks can be
safely *omitted* at runtime.

IMO ;-)

DoDi
Hans-Peter Diettrich

2006-10-21, 10:02 pm

glen herrmannsfeldt wrote:

> If a compiler can determine at compile time that, for example, a loop
> variable can't go outside the array, or can test the loop conditions
> once instead of each time, it could be faster. I believe some
> compilers do that, but I don't know that any language includes it in
> the language definition.


Delphi example:

for i := low(ar) to high(ar) do
do_something_with(ar[i]);

Here the compiler knows that i will always be within the array bounds.
low() and high() are compiler magics, which determine the bounds of
either static or dynamic arrays, or of other enumeratable or ordinal
types. The loop index must be a local variable (no aliasing), and cannot
be changed programmatically inside the loop. The implementation can use
an loop down-counter, instead of an index, and an pointer to the
elements. So the resulting code can be as fast as the same (but unsafe)
implementation in C.

DoDi
[The IBM PL.8 project also did a lot of work on optimizing away bounds
checks when the compiler could tell they weren't needed. Don't know
what became of that work. -John]

fermineutron

2006-10-21, 10:02 pm

Thank you everyone that replied. It will probaby take me a while to
process everything that has bess posted so far. Ill post again soon.

Another question is this:

If I do decide to write the new language, do you think that writing a
compiler into asm be the better bet or will translating variables into
C structures and handling everything on the level of C language be
good enough? Natureally then there would be a need do use a C compiler
to actually produce the exe file.
[Depends on whether your data structures are ones that C can
represent. If one stack and no garbage collection are adequate, C
should work fine. -John]

Daniel C. Wang

2006-10-30, 7:32 pm

fermineutron wrote:
{stuff deleted}
> Basically i want a programming language which has the speed and
> flexibility of C, while user friendlenness of MatLab.


Some data points

http://www.ffconsultancy.com/free/r...cpp_vs_sml.html

since you mentioned Mathlab you might also be interested in FISh

http://www-staff.it.uts.edu.au/~cbj/FISh/fish.html

idknow@gmail.com

2006-10-30, 7:32 pm

fermineutron wrote:
> Thank you everyone that replied. It will probaby take me a while to
> process everything that has bess posted so far. Ill post again soon.
>
> Another question is this:
>
> If I do decide to write the new language, do you think that writing a
> compiler into asm be the better bet or will translating variables into
> C structures and handling everything on the level of C language be
> good enough? Natureally then there would be a need do use a C compiler
> to actually produce the exe file.
> [Depends on whether your data structures are ones that C can
> represent. If one stack and no garbage collection are adequate, C
> should work fine. -John]


Fermineutron, greetings.

I'd suggest that you follow the common maxim and optimise the 10% of
the code that is run 90% of the time

instead of just willy-nilly(TM, technical term) optimising.

enjoy, keep us posted!

Bjarke Walling

2006-10-30, 7:32 pm

fermineutron skrev:
> Another question is this:
>
> If I do decide to write the new language, do you think that writing a
> compiler into asm be the better bet or will translating variables into
> C structures and handling everything on the level of C language be
> good enough? Natureally then there would be a need do use a C compiler
> to actually produce the exe file.


That depends. I am writing a compiler which produces C output from some
language I designed. Then you would compile the C code afterwards, or
my compiler could call a known C compiler itself. I chose this method
because of a number of advantages:
* It is easier for me to write this compiler instead of writing
assembly instructions (I could do it, but it would take longer time)
* I don't have to worry about specific optimization problems if I
assume that I use an optimizing C compiler.
* My language is more or less independant of the architecture and
platform if I stick to the C standard. Ie. you can use another C
compiler to compile for your specific platform.

This is only possible because my structures is representable in C. In
some of the structures I add runtime type information that I need. I
may also write a garbage collector in C if I find that I need it, and
it is easier (with my knowledge) to write it in C than writing it in
pure assembly.

I might go for compiling directly to assembly in the future. The
advantage is that you have full control about how the application is
running and maybe you are able to optimize your code better than the C
compiler.

So basicly, it is up to you - what are the advantages and what are
the drawbacks.

- Bjarke Walling

DavidM

2006-10-30, 7:32 pm

fermineutron wrote:
> This is somewhat of a theoretical question for the experts here. Most
> of easy to use languages are too slow for any serious computing while
> the languages like C and assembly are somewhat of a pain to use. The
> key issues that i identyfy as pain include but not limited to,
> variable declaration array boundary checking, strict data types
> etc. Most can be solved by using extended variables that carry key
> information about the data in the variable within the variable, for
> example using the 1st byte to designate the data type of a variable,
> int, float double etc. using second byte to designate the number of
> dimensions in the variable lets say its N, then using N 4-byte
> integers to store dimension sizes, and this header is followed by the
> data itself.
>


You could look into D:
www.digitalmars.com/d/

It has arrays of any type which have a built in .length property and a
convenient foreach(..) method to iterate through them. Changing the
size of the array is as simple as:
array.length = array.length + 100;

it has auto types which is basic type-inferencing:
auto x = callSomeFunction();
x.doSomething();

It has many other features: scope-dependent operations, inner
functions(very useful!), mixins, garbage collection, built-in
associative arrays, delegates, inner classes and more.

You should give it a try.
-DavidM

Hans-Peter Diettrich

2006-10-30, 7:32 pm

Bjarke Walling wrote:

> * My language is more or less independant of the architecture and
> platform if I stick to the C standard. Ie. you can use another C
> compiler to compile for your specific platform.


More less, I think. If your language allows for filehandling and other
OS specific features, your libraries are more important than the
compiler, with regards to portability.

DoDi

Bjarke Walling

2006-10-30, 7:32 pm

Hans-Peter Diettrich skrev:
> Bjarke Walling wrote:
>
> More less, I think. If your language allows for filehandling and other
> OS specific features, your libraries are more important than the
> compiler, with regards to portability.


Yes, you are right. I try to solve this another way, but regarding the
question of fermineutron the libraries are more important to
portability.

- Bjarke Walling

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com