For Programmers: Free Programming Magazines  


Home > Archive > Compilers > March 2004 > Re: Two questions about compiler design









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Two questions about compiler design
Jens Troeger

2004-03-27, 12:28 am

> > I've basically read most of the dragon book, and according to that book ...

> Try reading Cooper & Torczon's new book, Engineering A Compiler.
> It's much more up to date, and very practical.


I found Michael Scott's book "Programming Language Pragmatics" a great
read too. It's from 1999 (or is that already considered out-dated??)
and also addresses briefly different language paradigms. It goes the
whole way, chapter by chapter, from scanning to parsing, grammars and
IR's to code improvement and instruction selection.

Cheers,
Jens
Scott Moore

2004-03-27, 12:28 am

"Isaac" <isaac@latveria.castledoom.org> wrote

> I tried searching for PCode references a few years back and didn't
> come up with much. Could you provide a pointer or two?
>
> Isaac


Commonly a reference to the intermediate format used in Wirth's
porting package for Pascal. See my page on it at:

http://www.moorecad.com/standardpascal/p4.html

Its basically just the intermediate for a Pascal, it seems to
garner attention because it was the first widely used "virtual
machine" solution, and even had a hardware processor specifically
created for it.

None of that was ever intended by the authors of the program. They
just wanted to help others get compilers going faster.
Nick Maclaren

2004-03-27, 12:28 am

Scott Moore <samiam@moorecad.com> wrote:
>"Isaac" <isaac@latveria.castledoom.org> wrote
>
>Commonly a reference to the intermediate format used in Wirth's
>porting package for Pascal. See my page on it at:
>
>http://www.moorecad.com/standardpascal/p4.html
>
>Its basically just the intermediate for a Pascal, it seems to
>garner attention because it was the first widely used "virtual
>machine" solution, and even had a hardware processor specifically
>created for it.


I don't think so. Ocode was somewhat earlier, and I have some vague
memories of others. If you count total implementations (alive and
dead), I think that you will find that Ocode is still a long way ahead
of Pcode. BCPL was incredibly widely ported, because it was often
ported as a basis for specific packages (Cambridge LISP, among several
others).

Wirth caught the zeitgeist in a way that Richards didn't, which is why
the public perception of Pascal being more widely available than BCPL
comes from. But I don't think it was so and, because only some
Pascals have used Pcode but almost all BCPLs have used Ocode, I doubt
very much that Pcode ever got near Ocode in number of ports.

However, Pascal overtook BCPL in terms of LIVE implementations some
time ago. I can't guess which would be ahead in historical ones.


Regards,
Nick Maclaren.
Tom Linden

2004-03-27, 12:28 am

>Commonly a reference to the intermediate format used in Wirth's
>porting package for Pascal. See my page on it at:
>
>http://www.moorecad.com/standardpascal/p4.html
>
>Its basically just the intermediate for a Pascal, it seems to
>garner attention because it was the first widely used "virtual
>machine" solution, and even had a hardware processor specifically
>created for it.


I don't think so. Ocode was somewhat earlier, and I have some vague
memories of others. If you count total implementations (alive and
dead), I think that you will find that Ocode is still a long way ahead
of Pcode. BCPL was incredibly widely ported, because it was often
ported as a basis for specific packages (Cambridge LISP, among several
others).

Wirth caught the zeitgeist in a way that Richards didn't, which is why
the public perception of Pascal being more widely available than BCPL
comes from. But I don't think it was so and, because only some
Pascals have used Pcode but almost all BCPLs have used Ocode, I doubt
very much that Pcode ever got near Ocode in number of ports.

However, Pascal overtook BCPL in terms of LIVE implementations some
time ago. I can't guess which would be ahead in historical ones.

Between the two came the n-tuple design that Freiburghouse developed
for PL/I which was widely used for a number of languages by Digital,
Wang, Prime, DG, Honeywell, CDC, Stratus and others I can't recall.
This originally derived from the Multics PL/I (see
http://www.multicians.org/ ) which had introduced the concept of usage
counts for register allocation. Currently only used by PL/I on VMS
and Tru64 Platforms also VaxC used it as did the SCAN and Pearl and (I
think) Coral 66 on the VAX. The symbol table was not embedded in the
IL but was separately built, hierarchical and tree-structured.
Chris F Clark

2004-03-27, 12:29 am

Tom Linden wrote:
> Between the two came the n-tuple design that Freiburghouse developed
> for PL/I which was widely used for a number of languages by Digital,
> Wang, Prime, DG, Honeywell, CDC, Stratus and others I can't recall.


Ah, fond memories rekindled. Freiburghouse's IL was fairly close to a
useful UNCOL. At Prime, we had frontends for it for PL/I (at least 3
dialects), Fortran, Cobol, Pascal, Basic, Modula-2, and RPG that were
shipped to customers and supported. In house, Kevin Cummings wrote an
Algol-60 frontend as a fun project. I'm pretty sure a C frontend was
written also, but was not the compiler that got shipped to customers.
The best part was that most of the backend and scaffolding for all
those projects was common.

At Prime we even built our own improved global optimizer for the IL.
We started to build a third version of the global optimizer based on
Fred Chow's Stanford thesis, but that fell prey to the second system
syndrome and died a death due to management not wanting to fund the
project unless it had no risks and the project engineers adding things
to reduce the risk, but never willing to say that there were none. (I
left when the project plan was over 100 pages with no end in sight.)

In addition to the hardware vendors Tom listed above, I know two
compiler houses made a good living off the technology, TSI and LPI.
Later in my career I did a stint with LPI.

Having mentioned Fred Chow, it is worth tying this back to P-code.
His thesis used a variant of P-code with register information that he
called U-code. There were several different U-code frontends built
also. I recall C and Fortran at DEC. I was there when they were
retiring the U-code backend, replacing it with the GEM backend.

Having experienced both, I think the U-code IL was better for many
compiler purposes, but not as good an UNCOL. The global optimizer
technology associated with U-code was certainly better, both simpler
to maintain and more sophisticated. In contrast, I think the
Freiburghouse code generator technology was better, especially from an
easy of maintenance standpoint. Part of this was due to the fact the
the Freiburghouse IL was not as close to the machine and let more
frontend semantics peer through. For example, when looking at a
"reference" to a variable in the Freiburghouse IL, one had to know
which frontend produced the reference for certain aspects of its
semantics--that made it a better UNCOL, becase the frontend didn't
have to bend so much to match some other languages memory model.

The distance from the machine helped at code generation time, because
each IL operator stood for something more or less complete and one
could then factor the cases as code generation time. There was a
simple but useful "language" used by the code generators to implement
those semantics. That made all the difference. It made the code
generator into something one could read easily and understand the code
sequences coming out.

In contrast, with the U-code IL, semantics were composed from more
primitive operations and code generation worked by matching
patterns--think BURG. While one can express all the same decisions
that way and perhaps more, it is much less clear to a code generator
writer how a small change will affect the code generated. Of course,
exposing all of those details in the IL was part of what made the
U-code optimizer good. The optimizer could easily rewrite unncessary
operations out of the program, because the operations were all exposed
in the IL. In the Freiburghouse IL, many of those things were more
implicit and thus inaccessible to the optimizer.

Perhaps the most striking thing about that difference is how it was
actually localized to a small part of the different IL's. If one
looked at the opcode listings for both, they would be mostly
identical. The key difference being in the memory access opcodes,
Freiburghouse's IL had only a generic "reference" operation, where
U-code has explicit load and store operations that used explicit
arithmetic to calculate the exact memory location. However, that
semantic difference ripples through the IL and completely changes
everything. One could easily implement a Freiburghouse on nearly any
architecture, memory-to-memory, a few registers, many registers, stack
based, byte addressible, word addressible--the IL was architecure
neutral. In constrast, U-code was optimized toward many register
machines with a load-store architecture (with a preference toward byte
addressible machines). If your machine doesn't look like that, U-code
isn't quite as useful.

Of course, in the current time, we seem to have a paucity of different
architectures, but that won't last forever, I hope. And, that brings
up the final and perhaps most important point.

Freiburghouse designed his IL in a time when many architectures were
around and none were prevelant. Making his IL into an UNCOL,
particularly for different underlying machines was important. For him
investing a few more hours into developing a code generator for a new
machine/language combination actually meant money in his pocket, so he
wanted that task simple enough that he could effectively do it. The
ability to optimize the code on those machines was relevant but the
opportunities were more limited.

The U-code architecture reflects the great shift to more of an
atchitectural monoculture. Different ports were not as different (at
least in terms of basic machine semantics) and getting a more relevant
was achieving optimized results on the machines which were becoming
dominant, machines whose architecture is designed for C-like
languages, byte addressible uniform memory access and a reasonable
sized register file.

Well, I've rambled on enough on this topic. I hope something in here
was of interest....

Hope this helps,
-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
Tom Linden

2004-03-27, 12:29 am

Tom Linden wrote:
> Between the two came the n-tuple design that Freiburghouse developed
> for PL/I which was widely used for a number of languages by Digital,
> Wang, Prime, DG, Honeywell, CDC, Stratus and others I can't recall.


Ah, fond memories rekindled. Freiburghouse's IL was fairly close to a
useful UNCOL. At Prime, we had frontends for it for PL/I (at least 3
dialects), Fortran, Cobol, Pascal, Basic, Modula-2, and RPG that were
shipped to customers and supported. In house, Kevin Cummings wrote an
Algol-60 frontend as a fun project. I'm pretty sure a C frontend was
written also, but was not the compiler that got shipped to customers.
The best part was that most of the backend and scaffolding for all
those projects was common.

Prime's C front-end was written by Conboy and did indeed conform to
the PL/I IL and Symbol table design. I no longer have the souces
around in electronic form, but I found a while back a listing of the
Imode code gen for the 50 series. In fact the code gen I did for the
Alpha was a distant offshoot of this.

At Prime we even built our own improved global optimizer for the IL.
We started to build a third version of the global optimizer based on
Fred Chow's Stanford thesis, but that fell prey to the second system
syndrome and died a death due to management not wanting to fund the
project unless it had no risks and the project engineers adding things
to reduce the risk, but never willing to say that there were none. (I
left when the project plan was over 100 pages with no end in sight.)

Ah, oprimization. Interesting from several points of view. When
Cutler et. al. did the VAX code gen they basically followed Aho,
Sethi and Ullman. But the interesting thing is that you can get 80%
of the optimizations with a rather simple optimizer, at a fraction of
the cost to develop - and maintain.

In addition to the hardware vendors Tom listed above, I know two
compiler houses made a good living off the technology, TSI and LPI.
Later in my career I did a stint with LPI.

Well actually, I don't know if you would call it a good living, but
TSI morphed into Kednos which today only does PL/I for VAX and Alpha
and next year Itanium.

Having mentioned Fred Chow, it is worth tying this back to P-code.
His thesis used a variant of P-code with register information that he
called U-code. There were several different U-code frontends built
also. I recall C and Fortran at DEC. I was there when they were
retiring the U-code backend, replacing it with the GEM backend.

I believe the Ucode is what Hennessey used for the MIPS compilers.
Interestingly, most of the GEM staff came out of Digital's PL/I group
which had develpoed VCG. There focus was largely C so they eliminated
frame pointers, which stack unwinfing a bit of a chore and indeed
required one to use undocumented features.

Having experienced both, I think the U-code IL was better for many
compiler purposes, but not as good an UNCOL. The global optimizer
technology associated with U-code was certainly better, both simpler
to maintain and more sophisticated. In contrast, I think the
Freiburghouse code generator technology was better, especially from an
easy of maintenance standpoint. Part of this was due to the fact the
the Freiburghouse IL was not as close to the machine and let more
frontend semantics peer through. For example, when looking at a
"reference" to a variable in the Freiburghouse IL, one had to know
which frontend produced the reference for certain aspects of its
semantics--that made it a better UNCOL, becase the frontend didn't
have to bend so much to match some other languages memory model.

Yes, the IL was much like an abstract, overloaded assembly language.
In fact, I, as an experiment, generated in the semantic pass of PL/I
IL code for bultin functions essentially writing the algorithms in IL
to be inlined rather than as a library call and this turned out to be
very simple and almost self documenting

The distance from the machine helped at code generation time, because
each IL operator stood for something more or less complete and one
could then factor the cases as code generation time. There was a
simple but useful "language" used by the code generators to implement
those semantics. That made all the difference. It made the code
generator into something one could read easily and understand the code
sequences coming out.

In contrast, with the U-code IL, semantics were composed from more
primitive operations and code generation worked by matching
patterns--think BURG. While one can express all the same decisions
that way and perhaps more, it is much less clear to a code generator
writer how a small change will affect the code generated. Of course,
exposing all of those details in the IL was part of what made the
U-code optimizer good. The optimizer could easily rewrite unncessary
operations out of the program, because the operations were all exposed
in the IL. In the Freiburghouse IL, many of those things were more
implicit and thus inaccessible to the optimizer.

Perhaps the most striking thing about that difference is how it was
actually localized to a small part of the different IL's. If one
looked at the opcode listings for both, they would be mostly
identical. The key difference being in the memory access opcodes,
Freiburghouse's IL had only a generic "reference" operation, where
U-code has explicit load and store operations that used explicit
arithmetic to calculate the exact memory location. However, that
semantic difference ripples through the IL and completely changes
everything. One could easily implement a Freiburghouse on nearly any
architecture, memory-to-memory, a few registers, many registers, stack
based, byte addressible, word addressible--the IL was architecure
neutral. In constrast, U-code was optimized toward many register
machines with a load-store architecture (with a preference toward byte
addressible machines). If your machine doesn't look like that, U-code
isn't quite as useful.

The ref operator was specifically designed with a view in mind to
efficiently reference a member of an indexed, based strucuture, Ucode
did not have that capability. This also helped with Fortran common.

Of course, in the current time, we seem to have a paucity of different
architectures, but that won't last forever, I hope. And, that brings
up the final and perhaps most important point.

What gave rise to all the different architectres was essentially the
AMD 2901 bit slice chips which made it possible to rather easily put
together a microcode architecture. The latest Xeon I believe has 125
million transistors each of which is about the sice of the DNA
molecule! It takes billions to create a plant that can produce those
kinds of chips, so I think diversity is not part of the future.

Freiburghouse designed his IL in a time when many architectures were
around and none were prevelant. Making his IL into an UNCOL,
particularly for different underlying machines was important. For him
investing a few more hours into developing a code generator for a new
machine/language combination actually meant money in his pocket, so he
wanted that task simple enough that he could effectively do it. The
ability to optimize the code on those machines was relevant but the
opportunities were more limited.

He actually formed most of his ideas while at Multics.

The U-code architecture reflects the great shift to more of an
atchitectural monoculture. Different ports were not as different (at
least in terms of basic machine semantics) and getting a more relevant
was achieving optimized results on the machines which were becoming
dominant, machines whose architecture is designed for C-like
languages, byte addressible uniform memory access and a reasonable
sized register file.

Well, I've rambled on enough on this topic. I hope something in here
was of interest....

Me too.

Chris F Clark

2004-03-27, 12:29 am

Tom Linden wrote:
> Prime's C front-end was written by Conboy and did indeed conform to
> the PL/I IL and Symbol table design.


Tom's memory is more reliable on mine at this point. I didn't
remember whether Garth Conboy's compiler used the Freiburghouse (TSI
or later "common" as we called it) backend or not--I just knew that
some C compiler did. Garth's compiler is the one that was shipped.

Two things about the Prime architecture are worth mentioning.

One is that pointers and integers were not compatibly sized (at least
not at the beginning)--where byte pointers took up more space (48
bits) in memory than integers (and could actually address individual
bits), but the "integer" part of a pointer couldn't hold a complete
integers worth of values (because some bits in the pointer were part
of the ring security system and the pointer manipulation instructions
would change them on you--setting them to secure values).

Garth actually came up with a neat work-around for this problem, using
the 32 bit word-address only form of the pointer, which had 1 bit in
it which indicated that the extended 48 bit form was used. I don't
know all the details, but essentially he co-opted that 1 bit to mean
other byte (the memory words were 16 bit quantities and thus had two
characters each). I presume when he actually used those pointers as
pointers, he put them in a special location where the actional 16 bits
simply addressed the "other" byte of the word. This gave him pointers
that were "no larger" than integers and could be stored in 32 bit
words (at least when not using them as "pointers"). I think by layout
and padding games he could even puth the 16 bit extensions down in
memory structures in such a way that moving the pointers as 32 bit
quantities would work, but in memory the 16 bits necessary to make
them "real" pointers also was there.

By the time the compiler was shipped, the hardware guys had added a
new instruction to the I-mode instruction set and created a new
instruction set around that model I*-mode. Note, Prime machines had
several different instruction sets they used internally, which
represented different strata in the machines history--there were two
forms of S-mode that corresponded to the original Honeywell (516?)
computers that Primes were originally designed to emulate, R-mode
which added recursive subroutine instructions, V-mode which added
virtual memory, I-mode which added a general register set, and I*-mode
which added character pointers based on Garth's C model. The
interesting thing is that because Prime machines were all micro-coded,
one could switch instruction sets on the fly (e.g. on any call
instruction and I think at other times). The machine booted in one of
the S-modes, but quickly entered R-mode to load the OS, which was
partially in R-mode but mostly in V-mode (while I was there). When
the OS executed user code, it was mostly V-mode or one of the I-modes.

The second point had to do with the Freiburghouse compiler technology,
although it was useful as an UNCOL, it wasn't perfect. The place I
remember this most dramatically was Pascal. We used to internally
call it the Pascal-subset-G compiler, because semantically it matched
the aspects of the PL/I-subset-G (called pl1g) compiler as closely as
it matched Pascal. It had extensions to the Pascal language (as well
as deficiencies) that would have been hard to remove because the
underlying run-time libraries and IL were designed to support PL/I.
As a result of these problems, we used to say (borrowing from a
perfume commercial popular at the time) "Promise them anything but
give them the common backend" our way of talking about the
fundamentals of the code generator, optimizer, and run-time library
that were Freiburghouse based.

> I believe the Ucode is what Hennessey used for the MIPS compilers.


Yes, Fred Chow's compiler was the one Hennessey used for the MIPS
compilers. That same suite was the basis of the DEC Unix compilers,
prior to GEM. It was also used by SGI (at one time; I don't know
about the delta compilers). On an interview with Microsoft, I had
some candid converstations that suggest that at least at one time, one
of the Microsoft C compilers was built on the same technology.

-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
[The VAX VMS C compiler was also built on Freiburghouse's PL/I compiler.
Engineering a Compiler by Anklam et al is mostly about retargeting the
PL/I compiler to the VAX, but at the end there's a short discussion about
the C front end they also wrote and shipped. -John]
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com