For Programmers: Free Programming Magazines  


Home > Archive > Compilers > September 2004 > Compiler and interpreter origins









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Compiler and interpreter origins
Lauri Alanko

2004-07-28, 9:08 pm

I am trying to understand the mindset prevalent during the advent of
high-level programming languages, especially regarding compilers and
run-time evaluation. I hope someone can shed some historical insight.

Firstly, back when everything was done in pure machine code or
assembly, how common was the use of self-modifying code? Was it only
used for things like inlined loop counters, or was there anything like
run-time generation of non-trivial code? In a word, was it seen as a
useful thing that a von Neumann architecture general-purpose
microprocessor really was a universal Turing machine that could be
reprogrammed on the fly?

When batch compilers became popular, such flexibility was obviously
lost: you couldn't generate new Fortran code on the fly from your
original Fortran program. Was this ever seen as a problem?

Interpreters, of course, easily support run-time code generation
simply by allowing the interpreter to be called from within the
program. Lisp certainly supported eval from day one. But did
McCarthy's team _invent_ the idea of reading in high-level code at
run-time and then interpreting it (instead of compiling it to machine
code beforehand), or was interpretation an older idea? Of course UTMs
and universal functions had been known, but when was it realized that
the same idea could be applied in practice?

I'm sorry about the vagueness of these questions. Any remarks or
references relating to the subject are appreciated.

Lauri Alanko
la@iki.fi
[Back in the 1950s, people used every coding trick they could to
squeeze programs into tiny memories, including all sorts of
self-modifying code. For a famous, probably apochryphal, story that
captures the way that people programmed then, see
http://catb.org/~esr/jargon/html/story-of-mel.html

There were lots of interpreters in the 1950s, typically running a
machine-like code that was higher level than the real machine code,
e.g., with floating point and index registers that the underlying
machine didn't have. Those survived into the 1960s. The standard
float package on the PDP-8 was an interpreter that had instructions
with the same address formats as the real instructions so you could
use the regular assembler to code them by renaming the opcodes. I
think that the idea of having an external representation that mapped
to and from an internal representation that could be interpreted was
new to Lisp. For a long time it was seen as an interim hack until
Lisp 2 with a nicer syntax was ready. -John]

Jeff Kenton

2004-08-04, 3:57 am

Lauri Alanko wrote:

> Firstly, back when everything was done in pure machine code or
> assembly, how common was the use of self-modifying code?


Very common. For example, the IBM 7094 had a whole class of
instructions explicitly for writing self-modifying code --
instructions to load and store the opcode portion of words
(instructions), and the register field, and the address field.

There is a very nice paper by Ken Thompson on regular expression
matching that generates a state machine on the fly on the 7094. Each
chunk of code is a state. (It was granted US Patent 3,568,156,
entitled "Text Matching Algorithm," filed 9 Aug 1967, issued 2 Mar
1971, inventor Ken Thompson.)
--
-------------------------------------------------------------------------
= Jeff Kenton Consulting and software development =
= http://home.comcast.net/~jeffrey.kenton =
Dick Weaver

2004-08-05, 3:59 pm

Lauri Alanko wrote:
>
> I am trying to understand the mindset prevalent during the advent of
> high-level programming languages, especially regarding compilers and
> run-time evaluation. I hope someone can shed some historical insight.


Some mindsets that come to mind (reading over this list, I can't really
distinguish between mindset and environment):

1. Compilers were thought of as "Automatic Programming". An early
report on Fortran, 1957, was "The Fortran Automatic Coding System"

2. There was only one criteria for good vs. bad compilers: efficiency of
the generated code.

3. A mindset of limited resources; useful, economic, results were
produced on machines with memories so small that multiplier acronyms
("K" for example) were not needed in size specifications. And schedules
were in months, not years.

4. A computer could be a human being (who operated a Frieden or other
calculator). You can find books of mathematical tables where the
computers - the names of people - are listed on the title page.

5. Users with problems requiring computer solutions took their problem
to a programmer. Only after high-level languages "advented" (in
particular, FORTRAN) did it become COMMON for users to write their own
programs.

6. Cascading compilers. For example, FORTRANSIT (IBM 650) input was
FORTRAN, the compiler output was an IT program. The IT compiler
produced SOAP (the 650s assembly language) output. The SOAP processor
produced your object deck.

7. Languages were sometimes designed/implemented by people trying to
solve their own problems. IPL, LISP being examples.


> Firstly, back when everything was done in pure machine code or
> assembly, how common was the use of self-modifying code? ...


See "The Preparation of Programs for an Electronic Digital Computer",
Wilkes, Wheeler, & Gill, 1951, page 8, "Modification of orders by the
program".

> Was it only used for things like inlined loop counters, or was there
> anything like run-time generation of non-trivial code? In a word,
> was it seen as a useful thing that a von Neumann architecture
> general-purpose microprocessor really was a universal Turing machine
> that could be reprogrammed on the fly?


> When batch compilers became popular, such flexibility was obviously
> lost: you couldn't generate new Fortran code on the fly from your
> original Fortran program. Was this ever seen as a problem?


Well, it wasn't seen as a problem because you could do it if you wanted
to (I assume that few people wanted to! - another mindset thing). Not
in complete generality, but you are asking about the "advent" - when
things got started. Look up FORMAC (Sammet). I think you do things
like (where "set" assigns the expression, not the value of the
expression. Set is not the correct FORMAC keyword, but it will do for
this example)

set v1 = a + b
set v2 = c + d
v3 = v1 * v2

v3 would then have the value ac + ad + bc + bd
There must have been an evaluate function.

> Interpreters, of course, easily support run-time code generation
> simply by allowing the interpreter to be called from within the
> program. Lisp certainly supported eval from day one. But did
> McCarthy's team _invent_ the idea of reading in high-level code at
> run-time and then interpreting it (instead of compiling it to machine
> code beforehand), or was interpretation an older idea? Of course UTMs
> and universal functions had been known, but when was it realized that
> the same idea could be applied in practice?


Interpreters have "always" been here. Well, if not from day 1,
certainly from day 2.
Lookup "Speedcoding" for the IBM 701, about 1953. Or "Short Code" for
the UNIVAC. Or Laning/Zierler.

Fortran IV could "read in high-level code at run-time and then interpret
it". Only FORMAT specifications, but exactly the function you asked
about.
I don't know if this functionality was in any Fortran II implementations
or not, nor do I have dates. Comparable to LISP? Of course not, but you
asked about beginnings.

There was a language TRAC (Deutsch?) for the PDP-1 where the a program
could modify itself (I think).
glen herrmannsfeldt

2004-08-05, 3:59 pm

Lauri Alanko wrote:
(snip)

> Firstly, back when everything was done in pure machine code or
> assembly, how common was the use of self-modifying code?


(snip)

The IBM OS/360 Fortran library does it. I don't believe I know of any
compilers that generate self modifying code, though.

The IBM 360/91, one of the earlier machines doing out of order
execution, had special logic to detect modified instructions that had
been previously fetched.

IBM, as of OS/360, has three attributes they apply to load modules
(executable code files). Serially reusable, reentrant, and
refreshable. Reentrant and refreshable modules usually should not be
self modifying, and later OS would store then in read only page
frames.

Fortran library routines were serially reusable, but not reentrant or
refreshable.

Also, OS/360 channel programs were often self modifying.

-- glen
[The IBM 360 architecture specifically permits instruction modification
without extra serialization. I gather it's still a pain for hardware
implementers, although it's now much more comment to do an EXecute
instruction to run one instruction created at runtime. -John]

Rodney M. Bates

2004-08-09, 3:56 am

Lauri Alanko wrote:
>
> Firstly, back when everything was done in pure machine code or
> assembly, how common was the use of self-modifying code?


The first couple of machines I programmed (IBM 1620, IBM 1401) had no
address or index registers of any kind. So self-modifying code was
the only possible way to write things like loops going through arrays.
You could avoid getting hopelessly bogged down in tar, by working out
a couple of what would today be called design patterns and then
following them downright slavishly.

Later, the university paid extra for a 1620 "special feature" of
indirect addressing, which made it immeasurably easier. By the time I
discovered linked data structure, It was on an IBM 1410, which, as I
recall, had 3 index registers.
Nick Roberts

2004-08-09, 3:56 am

On 4 Aug 2004 02:44:48 -0400, Jeff Kenton <Jeffrey.Kenton@comcast.net>
wrote:

> Lauri Alanko wrote:
>
>
> Very common. ...


Indeed, I think in the /very/ early days of computing, when no machine
had more than one or two KB of memory*, it was generally necessary for
any practical program to do a lot of self-modification, as well as a
great deal of the loading of segments of code from the drum*.

It became a considerable art, squeezing functionality out of code that
could fit into such a tiny amount of memory. Unfortunately, it also
caused an assumption in the industry, one that lasted far too long,
that such self-modification was an acceptable technique.

Originally, the COBOL language had an

ALTER x TO PROCEED TO y

statement, which modified a GOTO statement (itself labelled x),
changing its destination from whatever it was before to y. As you can
imagine, programmers could -- and generally did -- achieve the most
spectacularly obfuscated code with this statement. It became quite
notorious, and quickly became deprecated and then removed from
compilers altogether.

Unfortunately, I think things have swung too far the other way, in
that even the humble GOTO is now considered verboten. Silly.
--
Nick Roberts

*Memory those days was 'core' memory, a three-dimensional wire grid
with a magnetic 'core' (a ring made from ferrite) around each triple
intersection. Each core stored one bit. It was a truly daft piece of
technology (by today's standards), but a fascinating work of art to
look at.**

*In those days, the fastest secondary storage device was a 'drum',
which was a rotating drum covered in magnetic material (like a disk),
with a read/write head that scanned from end to end. The last
generations of drums had multiple heads, and were very fast.***

[** - The more modern ones had core. Some of the earlier ones used
Williams tubes, CRTs that stored data (very unreliably) by the
presence or absence of spots on the CRT, or mercury tank or
magnetorestrictive delay lines which worked better but were hard to
program since you wanted to minimize the amount of time the comptuter
spent waiting for words to come around. Some computers like the
Electrodata 205, LGP 30, and IBM 650 ran straight off the drum. Core
was by comparison pretty simple, although you had do do strange things
to make big fast core banks, like sitting them in tanks of ing oil
in the IBM 7094. Typical core banks were 2 1/2 D (X, Y, and sense
wire) rather than 3D.

*** - Yup. The 650's drum in the mid 1950s spun at about 12K rpm,
a speed that disks have only recently matched. -John]

Nick Maclaren

2004-08-09, 3:56 am

Dick Weaver <rweaver@ix.netcom.com> wrote:
>Some mindsets that come to mind (reading over this list, I can't really
>distinguish between mindset and environment):
>
>1. Compilers were thought of as "Automatic Programming". An early
>report on Fortran, 1957, was "The Fortran Automatic Coding System"


Only in the sense of "automatic macro expansion". Remember that was
in the days when analysts wrote flow-charts and programmers coded them
into assembler.

>2. There was only one criterion for good vs. bad compilers:

efficiency of >the generated code.

Absolutely NOT! Certainly by the early 1960s, the concept of
debugging compilers existed, which were expected to have thorough
diagnostics, insert good checking and compile very fast. What
happened to that concept, I wonder? :-(

Even in the early 1950s, the memory and time requirements of a
compiler were an issue. I should have to ask whether anyone can
remember the relevant priorities.

> 5. Users with problems requiring computer solutions took their
>problem to a programmer. Only after high-level languages "advented"
>(in particular, FORTRAN) did it become COMMON for users to write

their own >programs.

Not in Cambridge! Here, users programmed in machine code and autocode
before Fortran was invented. EDSAC I was the first practical general
purpose computer, and EDSAC II was used by a wide spectrum of the
university (including Arts and Humanities). Of course, I am relying
on my seniors' claims, as as a mere stripling of 56 :-)

>6. Cascading compilers. For example, FORTRANSIT (IBM 650) input was
>FORTRAN, the compiler output was an IT program. The IT compiler
>produced SOAP (the 650s assembly language) output. The SOAP processor
>produced your object deck.


Yup. IAL on Titan - not a bad assembler. Typical optimising compilers
had half a dozen passes, often implemented as separate programs, and
some of them had external specifications.

>7. Languages were sometimes designed/implemented by people trying to
>solve their own problems. IPL, LISP being examples.


Aren't they still? Perl, Python, Java ....

Regards,
Nick Maclaren.

[The original Fortran computer emitted extremely good code because the
conventional wisdom was that compiled scientific code would be too
slow to be useful. I gather that not long after it was shipped, they
added an option to compile faster by skipping some of the optimization
and generate worse code, which everyone always used. Code speed
wasn't a big issue for Cobol, since commercial programs are invariably
I/O bound, but I don't know what the adoption issues were. -John]
John Slimick

2004-08-09, 3:56 am

Dick Weaver wrote:
> There was a language TRAC (Deutsch?) for the PDP-1 where the a program
> could modify itself (I think).


This isn't about TRAC, but (1967 or so) I spent a month trying to
understand the core of the Thor timesharing system for the PDP-1
written in PDP-1 assembler. I understood it all except for one place
where, in the code, in one location either of two data words would be
stored. One word was a "SZA" ("skip on zero accumulator") and the
other was a "SNA" ("skip on non-zero accumulator"). Why would anyone
modify a skip instruction? It was explained to me later that the code
had to be self-modifying. I swore that I would never write
self-modifying code, and I have kept my promise for 37 years. (And I
remember quite clearly who wrote that code, but I bet he doesn't
remember doing it.)

Klaus Wirth used to go off on self-modifying code; he
considered it a terrible evil.

john slimick
slimick@pitt.edu
[I missed the reference to Trac, a language in which I've probably
written more code than anyone. It's a strictly interpretive macro
expansion language somewhat similar to Unix m4, so just about anything
you write is by some standard self-modifying code. The code I wrote
didn't do much self-modifying beyond putting commas into arguments to
recursive routines to implement variable length argument lists and
poor man's structures. -John]
Martin Ward

2004-08-10, 9:03 pm

On Thursday 05 Aug 2004 7:27 pm, you wrote:
> [The IBM 360 architecture specifically permits instruction modification
> without extra serialization. I gather it's still a pain for hardware
> implementers, although it's now much more common to do an EXecute
> instruction to run one instruction created at runtime. -John]


Self-modifying code is common in IBM assembler code in daily use today.
The most common form is a NOP modified to a branch instruction,
for example:

LABEL1 NOP LABEL2
OI LABEL1+1,X'FO'

The Or-Immediate changes the second byte of the NOP: converting
the instruction from a "branch never" to a "branch always".
You save one bit of memory and a whole compare instruction!

The EXecute instruction loads the target instruction into memory,
overwrites part of it with the contents of a register, and executes
the result. It is most often used to produce a variable length move.
But I have seen EXecute used as a sort of one-instruction
subroutine call: EX is only four bytes, so if you use EX to EXecute
a six byte instruction, then you are saving two bytes!

--
Martin

Martin.Ward@durham.ac.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
[Eeeww. I didn't even do that back in 1970. -John]
Scott Moore

2004-08-10, 9:03 pm

Nick Maclaren wrote:

> Absolutely NOT! Certainly by the early 1960s, the concept of
> debugging compilers existed, which were expected to have thorough
> diagnostics, insert good checking and compile very fast. What
> happened to that concept, I wonder? :-(


In the 1960s, few compiled languages had features such as open ended
arrays, ability to create a pointer to any variable and similar
features. It was certainly possible to create a compiler that checked
for almost all access violations and range violations. This is no
longer the case. In fact, the trend is away from such checkable code.
Fortran 90 includes the ability to point to arbitrary variables. With
all of its other faults, at least Fortran was checkable.

--
Samiam is Scott A. Moore

Personal web site: http:/www.moorecad.com/scott
ISO 7185 Standard Pascal web site: http://www.moorecad.com/standardpascal
[Where's PL/I when we need it? -John]

beliavsky@aol.com

2004-08-11, 4:00 pm

nmm1@cus.cam.ac.uk (Nick Maclaren) wrote

> efficiency of >the generated code.
>
> Absolutely NOT! Certainly by the early 1960s, the concept of
> debugging compilers existed, which were expected to have thorough
> diagnostics, insert good checking and compile very fast. What
> happened to that concept, I wonder? :-(


The Lahey/Fujitsu Fortran 95 compiler is a very good debugging
compiler, when the proper compiler options are used. With other
options, it produces pretty fast code.

Dave Thompson

2004-08-23, 4:02 pm

On 9 Aug 2004 00:28:14 -0400, "Nick Roberts" <nick.roberts@acm.org>
wrote:

> On 4 Aug 2004 02:44:48 -0400, Jeff Kenton <Jeffrey.Kenton@comcast.net>

[ asked about selfmodifying code in the Good Old Days ]

> Originally, the COBOL language had an
>
> ALTER x TO PROCEED TO y
>
> statement, which modified a GOTO statement (itself labelled x),
> changing its destination from whatever it was before to y. As you can
> imagine, programmers could -- and generally did -- achieve the most
> spectacularly obfuscated code with this statement. It became quite
> notorious, and quickly became deprecated and then removed from
> compilers altogether.
>

ALTER and "GO TO dot" are/were still in the 1985 standard, although
deprecated; I believe they have finally been actually removed in 2002
or thereabouts, I haven't been following closely, which is not very
quick as I think the first COBOL was mid-50's. I know of at least a
couple of compilers still in active use that implement -85 and so
support this "feature" -- though as you indicate it should be
violently shunned by all right-thinking programmers.

> Unfortunately, I think things have swung too far the other way, in
> that even the humble GOTO is now considered verboten. Silly.


- David.Thompson1 at worldnet.att.net
Jeremy Wright

2004-08-25, 3:59 pm

Dave Thompson wrote:
>
> On 9 Aug 2004 00:28:14 -0400, "Nick Roberts" <nick.roberts@acm.org>
> wrote:
>
> [ asked about selfmodifying code in the Good Old Days ]
>
> ALTER and "GO TO dot" are/were still in the 1985 standard, although
> deprecated; [ snip ] as you indicate it should be
> violently shunned by all right-thinking programmers.


Unfortunately, compilers have to support programs written by
wrong-thinking programmers.

ALTER need not be implemented using self modifying code. The set of
targets that an alterable goto can be set to is known at compile time.
The alterable goto is then just an indexed jump, and the alters set
the index.
Torben Ęgidius Mogensen

2004-09-03, 4:03 pm

Jeremy Wright <jeremy.wright@microfocus.com> writes:

> Dave Thompson wrote:
>
> ALTER need not be implemented using self modifying code. The set of
> targets that an alterable goto can be set to is known at compile time.
> The alterable goto is then just an indexed jump, and the alters set
> the index.


This is true of any self-modifying code: You can work around it at
some small cost in performance. In case of the ALTER, you replace a
direct jump with a jump through a jump table. In extreme cases, you
can add what amounts to small interpreters to handle the apparent
self-modification.

If you are working with a system that uses interpreted byte-code, the
difference becomes muddled: If a program modifies its byte-code, is
that self-modifying code? The byte-code is definitely modified, but
you can argue that that the actual code is the interpreter, which
isn't modified. If you add JIT compilation to the byte-code
interpreter, the issue becomes even murkier.

Torben
glen herrmannsfeldt

2004-09-08, 3:57 am

Torben Ęgidius Mogensen wrote:
(snip regarding self modifying code)

> If you are working with a system that uses interpreted byte-code,
> the difference becomes muddled: If a program modifies its byte-code,
> is that self-modifying code? The byte-code is definitely modified,
> but you can argue that that the actual code is the interpreter,
> which isn't modified. If you add JIT compilation to the byte-code
> interpreter, the issue becomes even murkier.


And if your machine is microprogrammed, does that make a
difference?

If it has writable control store?

If programs are allowed to write into the writable control store?

If the JIT compiler writes to writable control store?

If programs interpreted by the JIT compiler can write to
the writable control store?

Lots of very murky possibilities.

-- glen

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com