Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I've just implemented register allocation by splitting three address code instructions into basic blocks, and computing liveness and next use for each block. I'm assuming that all variables are live at the end of the block, and all temporaries dead, and move backwards. This technique works well, except for the following scenario. Assume the following code segment: print(a(2) + a(-3)); The TAC would be something of this sort: == Block 1 ========= PUSHPARAM 2 CALL t1 A //call A and store the return value in t1 == Block 2 ========= PUSHPARAM -3 CALL t2 A == Block 3 ========= ADD t3 = t1, t2 PRINT t3 ================= The problem is that the temporaries in both block 1 (t1) and block 2 (t2) would be discarded (assumed dead since they are at the end of the block) and block 3 can't compute the addition. Am I missing something here? Many thanks for all your help, K. Phillips. p.s. I know about the graph colouring technique, I just want to try both techniques.
Post Follow-up to this messageOn Mar 22, 8:22 pm, kphillips <kevin.phillip...@yahoo.com> wrote: > The TAC would be something of this sort: > > == Block 1 ========= > PUSHPARAM 2 > CALL t1 A //call A and store the return value in > t1 > == Block 2 ========= > PUSHPARAM -3 > CALL t2 A > == Block 3 ========= > ADD t3 = t1, t2 > PRINT t3 > ================= > > The problem is that the temporaries in both block 1 (t1) and block 2 > (t2) would be discarded (assumed dead since they are at the end of the > block) and block 3 can't compute the addition. > > Am I missing something here? Perhaps the definition of a basic block. The definition of a temporary requires that none of its defs can reach the end of the corresponding block. Obviously your t1..3 don't meet this requirement.
Post Follow-up to this messagekphillips <kevin.phillips83@yahoo.com> writes: > ... > I've just implemented register allocation by splitting three address > code instructions into basic blocks, and computing liveness and next > use for each block. I'm assuming that all variables are live at the > end of the block, and all temporaries dead, and move backwards. > > This technique works well, except for the following scenario. > [expression containing procedure calls as subexpressions] > ... > Am I missing something here?... Because the lifetime of each of your temporaries is contained within a single expression, and because an expression is usually contained within a single basic block, you are expecting the lifetime of each temporary to be contained within a single basic block. But sometimes an expression can contain control flow, and hence be divided across multiple basic blocks, resulting in multi-block lifetimes for temporaries. You found one case of this. Another case arises if you have conditional (if-then-else) expressions, such as the ternary ?: operator in C-family languages, or short-circuit boolean operators, such as the && and || operators in C-family langauges. If your language has only procedure calls as intra-expression control flow, and if the only reason you are breaking the code into basic blocks is for your local register allocation, then the simplest solution would be to treat procedure calls as block boundaries. The blocks won't be basic blocks any more -- but they probably will be what you want. But if you are going to accommodate other control flow within expressions (such as from ?:, &&, ||), then I think that you are going to have to make some more major change in the structure of your compiler. Off the top of my head, here are some options that seem plausible: (0) Bite the bullet and do global analysis -- but I know you are setting that option aside. (1) Transform the source program (for example, by AST rewriting prior to translation to a three-address code) so as to not have any control flow within an expression. This would entail introducing extra variables and breaking statements containing complex expressions into multiple statements. (2) Rather than breaking the three-address code into blocks based only on information within the three-address code (such as the control flow instructions or the non-procedure-call control flow instructions), break the three-address code into blocks at boundaries that correspond to the source-level statements. That way,q temporaries' lifetimes will be contained within blocks. (The blocks will again not be basic blocks.) This may require some restructuring of your compiler if the information about source-level statements is gone before you do the breaking into blocks.
Post Follow-up to this messageThis is a correction to my own earlier post. I accidentally missed a "not" in the following: > ... then the simplest > solution would be to treat procedure calls as block boundaries. The > blocks won't be basic blocks any more -- but they probably will be > what you want. I meant to suggest *not* treating procedure calls as block boundaries.
Post Follow-up to this messageGene <gene.ressler@gmail.com> writes: > On Mar 22, 8:22 pm, kphillips <kevin.phillip...@yahoo.com> wrote: > > Perhaps the definition of a basic block. Actually, the original poster doesn't seem to have missed the definition of a basic block. The CALL instructions do create basic block boundaries. And what you say in the remainder of your message doesn't contradict this. Instead, you seem to be suggesting that he missed the definition of a temporary. > The definition of a temporary requires that none of its defs can reach > the end of the corresponding block. Obviously your t1..3 don't meet > this requirement. I'm not convinced that "temporary" has a consensus meaning, the way "basic block" does. I've seen plenty of authors other than the original poster use "temporary" to mean "a name introduced by the compiler, as opposed to appearing in the source code." But even if we accept your definition of "temporary" as meaning "local" (to a basic block), the way you would do that would not be by stipulating that the definitions can't reach the end of the block, but rather by stipulating that they not be live at the end of the block. The last of the definitions for any given name will always reach the end of the block. (Recall, a definition "reaches" a point if control must have passed through the definition on the way to the point without passing through any other definition of the same name in between.)
Post Follow-up to this messageMany thanks for the suggestions given. I will definitely analyse the pros and cons and hopefully come up with a solution. It does look strange that no book pointed this out. But at least I know that I'm on the right track. Once again, thanks for your help. Regards, Kevin Phillips
Post Follow-up to this messageOn Mar 23, 1:16 pm, Max Hailperin <m...@gustavus.edu> wrote: > Gene <gene.ress...@gmail.com> writes: > > > Actually, the original poster doesn't seem to have missed the > definition of a basic block. The CALL instructions do create basic > block boundaries. And what you say in the remainder of your message > doesn't contradict this. Instead, you seem to be suggesting that he > missed the definition of a temporary. > > > I'm not convinced that "temporary" has a consensus meaning, the way > "basic block" does. I've seen plenty of authors other than the > original poster use "temporary" to mean "a name introduced by the > compiler, as opposed to appearing in the source code." > > But even if we accept your definition of "temporary" as meaning > "local" (to a basic block), the way you would do that would not be by > stipulating that the definitions can't reach the end of the block, but > rather by stipulating that they not be live at the end of the block. > The last of the definitions for any given name will always reach the > end of the block. (Recall, a definition "reaches" a point if control > must have passed through the definition on the way to the point > without passing through any other definition of the same name in > between.) You're right I should have said temps must be dead at the end of the block, not unreachable. Terminology brain cramp. Sorry On the other hand, I disagree about basic blocks. I was refering to the ASU definition -- maximal sequences of code with no gotos. The intent is for block boundaries to fall at branches and convergences of flow. Call/returns don't cause flow branches. In this case, the OP is apparently using library calls for array indexing and output, certainly non-branching operations.
Post Follow-up to this messagekphillips <kevin.phillips83@yahoo.com> writes: > Many thanks for the suggestions given. I will definitely analyse the > pros and cons and hopefully come up with a solution. It does look > strange that no book pointed this out. But at least I know that I'm on > the right track. Once again, thanks for your help. Actually, it's not strange to me at all that no book you read pointed this out. It's at the level of detail that is often skipped. (see below) It has no theoretical implications and at the practical level any of Max's suggestions will work equally well. I worked on lots of code generators in the days when this was a relevant issue and each author picked a different one of the possibilities, mostly without thinking that it was worth commenting on. (Actually, I think each of the compilers I worked on did explain the exact choices made, but only in the parts of the code where the choices were implemented.) Reaching back into the dim recesses of my brain, I recall that Bob Freiberghouse (author of optimizing PL/I, FORTRAN, et. al. compilers) used different numbering schemes for temporaries which could cross a basic block boundary and ones that could not. Ones within a block were given positive numbers and were always freed by the end of the block. Ones that could cross a basic block were given negative numbers and explicitly freed by an instruction which marked their last use. If I recall correctly, for temporary allocation a call did not break a basic block. However, for other optimizations it did. Fred Chow (author of Pascal, C, et. al. compilers) treated temporaries like variables, but had a local numbering scheme for all variable uses within a basic block and a global numbering scheme for uses of a variable without respect to block boundaries. From any given use, one could always access both numbering schemes. I believe calls always broke basic blocks. My recollection of other compilers is more sketchy, as I did significantly more work on the above suites than on PCC, GCC, or any other compilers. However, the issue was well on the mind of C compiler writers as you will find that the standard (at one time at least) talked about which values had to be preserved/restored over setjmp/longjmp calls. The motivation for that was to allow temporaries to be kept in registers across calls. In general, most compilers that I have seen treat temporaries like variables. That is generally fairly simple to implement and is easy to get semantically correct. If one uses that as the general case, but then detects and special cases the temporaries which live entirely in one basic block, one gets most of the easy-to-find important cases. I've seen more divergence in whether a call breaks a block or not. There are definite advantages both ways. The problem with calls is the semantics that often get layered on top of them, such as non-local gotos (exceptions or signals) or aliasing and non-local reference issues. below: While I am not surprised you haven't found a book that covered that aspect, I suspect that it isn't that it is never addressed in compiler texts. For example, I would be willing to bet that Bob Morgan's book covers it, as did Bill Wulf's book, and the book on the VAX PL/I compiler. I would also be surprised if it wasn't addressed somewhere in Steve Muchnick's book. I know that Fred Chow's PhD thesis explains the choices he made in that area. Unfortunately, most of those books are either old or not at the introductory level. Moreover, I think there are many fine compiler texts that wouldn't be better if they mentioned that kind of detail. I suspect most student compilers implement languages where not breaking a block at a call boundary solves the problem and they don't have any semantics where doing that causes things to break. Hope this helps, -Chris **************************************** ************************************ ** Chris Clark Internet: christopher.f.clark@compiler-resources.co m Compiler Resources, Inc. or: compres@world.std.com 23 Bailey Rd Web Site: http://world.std.com/~compres Berlin, MA 01503 voice: (508) 435-5016 USA fax: (978) 838-0263 (24 hours) ---------------------------------------------------------------------------- --
Post Follow-up to this message> I suspect most student > compilers implement languages where not breaking a block at a call > boundary solves the problem and they don't have any semantics where > doing that causes things to break. Thanks for your in-depth reply. I'm still relatively new to the area and started off through books that are intended for academic courses. Augmenting the grammar / features does in fact lead you to situations not addressed by the same books (although most of the questions can be answered in the dragon book ...). Still, this mailing list archive has been a great resource so far in looking for answers and leads. Regards, K. Phillips
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.