Code Comments
Programming Forum and web based access to our favorite programming groups.Hi all, To test HLA, I generate several different equivalent source files in FASM, MASM, NASM, Gas, and HLA, compile them, disassemble the executables they produce, and then diff the disassembly files. This works great as long as HLA can be coerced to produce object code in the exact same form as (each of) the other assemblers. As the test is automated, it provides a great regression test tool that I can run anytime I make changes to the HLA system. In the past, if a particular assembler generated different code from HLA, and both code sequences were semantically equivalent (i.e., different encodings for the same instruction) I simply disabled that particular test and left it up to one of the other tests (with a different assembler) to catch any defects that crept into the code generator. For HLA v1.102, however, I'm adding a feature that allows HLA v1.102 to generate the same "object code signature" as MASM, FASM, TASM, Gas, and NASM, as much as is reasonable (i.e., I don't generate bad opcodes if one of these assemblers has a bug in the instruction encoding). One curious thing I've noticed is that the presence/absence of a 0x66 size prefix byte, for 16-bit only instructions, is all over the map. For example, consider the following two instructions: mov ds, ax mov ax, ds Clearly, there are only 16-bit versions of these two instructions. Some assemblers *always* put a 0x66 size prefix byte in front of the encodings, some never do, and at least one (MASM) puts size prefix bytes before one but not the other. The question I have is this: Is the some situation (i.e., some specific CPU) where the size prefix is absolutely necessary? Quite honestly, I've never executed these instructions in 32-bit mode, so I don't even know if they work. But given the number of assemblers that emit these instructions without the size prefix (in 32-bit mode), I assume that they still work properly? Or is this just a bug in those assemblers? The Intel documentation states: "In 32-bit mode, the assembler may insert the 16-bit operand-size prefix with this instruction (see the following "Description" section for further information). " As I read this, the 0x66 prefix byte is purely optional. However, that statement may not apply to some non-Intel CPUs (unlikely, but I'm asking because I just don't know). Is there any reason to waste a byte and emit these prefixes? Thanks, Randy Hyde
Post Follow-up to this messagerhyde@cs.ucr.edu" asked: [about seg reg size] No Randy, there never were any requirement to tell a 'load seg-reg' -instruction that a seg-reg can only hold 16 bits. The opposite direction ie: "mov eax,es" does an Zx-extension on modern CPUs while some oldies left the high word(upper 16 bits) undefined or unaltered, but a "push es |pop eax" sequence may just show any previous stack contents in the highword of eax instead of Zero. So neither 66 nor 67 prefix are required for load sreg, even some tools may create them for museum-compatibility reasons, but Push/Pop should be aware of it if not just used for save/restore seg-regs. __ wolfgang I expect Rene will be very happy to see you post again :)
Post Follow-up to this messageOn Mar 18, 4:24_pm, "rh...@cs.ucr.edu" <spamt...@crayne.org> wrote: > Hi all, > > To test HLA, I generate several different equivalent source files in > FASM, MASM, NASM, Gas, and HLA, compile them, disassemble the > executables they produce, and then diff the disassembly files. This > works great as long as HLA can be coerced to produce object code in > the exact same form as (each of) the other assemblers. As the test is > automated, it provides a great regression test tool that I can run > anytime I make changes to the HLA system. > > In the past, if a particular assembler generated different code from > HLA, and both code sequences were semantically equivalent (i.e., > different encodings for the same instruction) I simply disabled that > particular test and left it up to one of the other tests (with a > different assembler) to catch any defects that crept into the code > generator. > > For HLA v1.102, however, I'm adding a feature that allows HLA v1.102 > to generate the same "object code signature" as MASM, FASM, TASM, Gas, > and NASM, as much as is reasonable (i.e., I don't generate bad opcodes > if one of these assemblers has a bug in the instruction encoding). > > One curious thing I've noticed is that the presence/absence of a 0x66 > size prefix byte, for 16-bit only instructions, is all over the map. > For example, consider the following two instructions: > > mov ds, ax > mov ax, ds > This issue revolves around the machine code generated by: mov eax,ebx -and- mov ax,bx ..both are the same byte code sequence! 1 ;; -f bin -l tst4.lst -o tst4.bin tst4.nsm 2 ;; 3 [bits 16] 4 5 00000000 89D8 mov ax,bx 6 00000002 6689D8 mov eax,ebx 7 [bits 32] 8 00000005 89D8 mov eax,ebx 9 00000007 6689D8 mov ax,bx 10 11 ;; -= eof =- You'll notice the change in the sense between the two across the two different sections. The second set destined for a segment described by a 32-bit descriptor, the first set destined for a segment described by a 16-bit descriptor. There is a bit in the descriptor for a code segment which indicates the 'default instruction size .. that is to say, 'this' descriptor describes a 32-bit code segment, _or not_(i.e. 16-bit code). > Clearly, there are only 16-bit versions of these two instructions. > Some assemblers *always* put a 0x66 size prefix byte in front of the > encodings, some never do, and at least one (MASM) puts size prefix > bytes before one but not the other. > > The question I have is this: Is the some situation (i.e., some > specific CPU) where the size prefix is absolutely necessary? _Quite > honestly, I've never executed these instructions in 32-bit mode, so I > don't even know if they work. But given the number of assemblers that > emit these instructions without the size prefix (in 32-bit mode), I > assume that they still work properly? Or is this just a bug in those > assemblers? > I dunno, what does.. mov eax,DS .do? Automatically clear the high word? Or just overwrite the low word? > The Intel documentation states: "In 32-bit mode, the assembler may > insert the 16-bit operand-size prefix with this instruction (see the > following "Description" section for further information). " > > As I read this, the 0x66 prefix byte is purely optional. However, that > statement may not apply to some non-Intel CPUs (unlikely, but I'm > asking because I just don't know). > Not optional. > Is there any reason to waste a byte and emit these prefixes? So as not to hobble code, confuse users? > Thanks, > Randy Hyde Steve
Post Follow-up to this messagerhyde@cs.ucr.edu wrote: ... > One curious thing I've noticed is that the presence/absence of a 0x66 > size prefix byte, for 16-bit only instructions, is all over the map. > For example, consider the following two instructions: > > mov ds, ax > mov ax, ds > > Clearly, there are only 16-bit versions of these two instructions. True, "mov ds, ax" and "mov ds, eax" - with and without the prefix, IOW - do exactly the same thing. "mov ax, ds" and "mov eax, ds" are *not* the same, and the prefix is relevant. > Some assemblers *always* put a 0x66 size prefix byte in front of the > encodings, some never do, and at least one (MASM) puts size prefix > bytes before one but not the other. That's interesting... Nasm went round-and-round on the issue a while back. Referring to segreg as a *destination*, Intel said "most assemblers" emit the size prefix, and you could use "mov ds, eax" (absurd, on the face of it) to avoid it. It sounded like they were saying you should do it, but if you read closer, they almost said that those "most assemblers" were "doing it wrong" to emit the useless prefix - or making us write something that *looks* like a size-mismatch to avoid it. We took an informal survey, and Masm was about the only assembler that *was* doing it, at that time (Nasm used to, but stopped). Sounds like Masm has stopped, too. Who's doing "both"? > The question I have is this: Is the some situation (i.e., some > specific CPU) where the size prefix is absolutely necessary? Quite > honestly, I've never executed these instructions in 32-bit mode, so I > don't even know if they work. But given the number of assemblers that > emit these instructions without the size prefix (in 32-bit mode), I > assume that they still work properly? Or is this just a bug in those > assemblers? I would say, in 32-bit code, with segreg as source, and reg16 as destination, it would be a bug to *not* emit the 66h. One would expect this to leave the upper word of reg32 untouched. If one says "mov eax. ds" (no override), my P4 zeros the high word. It may be "undefined", but if we say "eax" we're not "expecting" it to remain untouched. As you point out, these instructions are so rarely used, that I'm not sure anybody really "expects" anything... > The Intel documentation states: "In 32-bit mode, the assembler may > insert the 16-bit operand-size prefix with this instruction (see the > following "Description" section for further information). " Does this refer to segreg as source, or as destination? > As I read this, the 0x66 prefix byte is purely optional. However, that > statement may not apply to some non-Intel CPUs (unlikely, but I'm > asking because I just don't know). > > Is there any reason to waste a byte and emit these prefixes? In the case of segreg as destination, no. In the case of reg16 as destination, it's not a "waste". That's the current state of what I think I know on the subject, anyway... Best, Frank
Post Follow-up to this message"Frank Kotler" <spamtrap@crayne.org> wrote in message news:JN2Ej.7276$rR1.5825@trndny09... > rhyde@cs.ucr.edu wrote: > > True, "mov ds, ax" and "mov ds, eax" - with and without the prefix, IOW > - do exactly the same thing. "mov ax, ds" and "mov eax, ds" are *not* > the same, and the prefix is relevant. > > > That's interesting... Nasm went round-and-round on the issue a while > back. Referring to segreg as a *destination*, Intel said "most > assemblers" emit the size prefix, and you could use "mov ds, eax" > (absurd, on the face of it) to avoid it. It sounded like they were > saying you should do it, but if you read closer, they almost said that > those "most assemblers" were "doing it wrong" to emit the useless prefix > - or making us write something that *looks* like a size-mismatch to > avoid it. We took an informal survey, and Masm was about the only > assembler that *was* doing it, at that time (Nasm used to, but stopped). > Sounds like Masm has stopped, too. Who's doing "both"? > Wow... That's not what I got from their doc's. What I got was, "If you use the 16-bit form of mov to a segment register in 32-bit mode, instead of using the 32-bit form of mov to a segment register in 32-bit mode, some assemblers will generate an unecessary 0x66 operand size override prefix due to the 16-bit segment register in the instruction." Since Randall was referring to 16-bit instructions, I thought he was referring to 16-bit mode too... This is what I would expect an assembler to do: BITS 16 mov ds, ax ; no 0x66 mov ds, eax ; yes 0x66, but unneeded mov ax, ds ; no 0x66 mov eax, ds ; yes 0x66 - required because of cpu dependent 32-bit operation BITS 32 mov ds, ax ; yes 0x66, but unneeded mov ds, eax ; no 0x66 mov ax, ds ; yes 0x66 - required to ensure 16-bit only operation mov eax, ds ; no 0x66 The reason I expect that is because the address and operand size prefixes can be used to execute 16-bit code in a 32-bit segment and vice-versa. So, I expect the assembler to place the overrides properly, even if unecessary. But, the override prefixes are required to ensure the proper operation for ax/eax. Rod Pemberton
Post Follow-up to this messageRod Pemberton wrote: > "Frank Kotler" <spamtrap@crayne.org> wrote in message > news:JN2Ej.7276$rR1.5825@trndny09... > > > > Wow... That's not what I got from their doc's. What I got was, "If you u se > the 16-bit form of mov to a segment register in 32-bit mode, instead of > using the 32-bit form of mov to a segment register in 32-bit mode, some > assemblers will generate an unecessary 0x66 operand size override prefix d ue > to the 16-bit segment register in the instruction." That's what I intended to say, more or less. > Since Randall was > referring to 16-bit instructions, I thought he was referring to 16-bit mod e > too... I'm pretty sure Randy remembers 16-bit code, but I don't think HLA does, so what we do in 32-bit code is probably the "important" point here. > This is what I would expect an assembler to do: > > BITS 16 > mov ds, ax ; no 0x66 > mov ds, eax ; yes 0x66, but unneeded > mov ax, ds ; no 0x66 > mov eax, ds ; yes 0x66 - required because of cpu dependent 32-bit operatio n > > BITS 32 > mov ds, ax ; yes 0x66, but unneeded > mov ds, eax ; no 0x66 > mov ax, ds ; yes 0x66 - required to ensure 16-bit only operation > mov eax, ds ; no 0x66 Bloat! Bloat! Bloat! :) Seriously, I'll agree that this one *really* "doesn't matter". But it seems "wrong" to me for an assembler to emit *any* unneeded byte. Besides the two "unneeded" cases - one if we're talking 32-bit code - I don't think there's any issue. > The reason I expect that is because the address and operand size prefixes > can be used to execute 16-bit code in a 32-bit segment and vice-versa. So , > I expect the assembler to place the overrides properly, even if unecessary.[/color ] "properly, even if unnecessary" strikes me as an oxymoron. > But, the override prefixes are required to ensure the proper operation for > ax/eax. If segreg is the source, yes. Randy's up to his ears, or deeper, in x86 encoding right now, so *surely* he knows better than to expect any such concept from x86, but he phrased the question as if "mov ds, ax" and "mov ax, ds" were "symmetrical". They are not. With ds as a destination, it really is "always 16-bit". With ds as source, the prefix makes a difference. Eric Isaacson claims to have an "identifiable" object signature (based on other instructions, not these). Maybe Randy could emit the "bloat prefix" every third time it comes up. Then, if it showed up every six years, we'd know you were using HLA! :) (maybe better make it 12 years... this is really small potatos! even with reg16/32 as a destination, how likely is it that anyone gives a damn what happens to the high word?) Best, Frank
Post Follow-up to this message"Frank Kotler" <spamtrap@crayne.org> wrote in message news:pxeEj.12292$FK1.10586@trndny08... > "properly, even if unnecessary" strikes me as an oxymoron. Otherwise, you introduce asymmetric behavior into the assembler... i.e., higher probability of coding errors IMO. That's more confusing than an oxymoron. ;) > how likely is it that anyone gives a > damn what happens to the high word?) It' not likely I'd ever need it, but if I did it's likely I would expect 'mov ax, ds' in BITS32 to execute exactly like BITS16 - leaving upper bits alone... As for the cpu dependent version, it's possible someone could be using that "defect" in a cpu detection routine. I guess I believe in "Do no harm." when it comes to assemblers. Rod Pemberton
Post Follow-up to this messageRod Pemberton wrote: > "Frank Kotler" <spamtrap@crayne.org> wrote in message > news:pxeEj.12292$FK1.10586@trndny08... > > > Otherwise, you introduce asymmetric behavior into the assembler... i.e., > higher probability of coding errors IMO. That's more confusing than an > oxymoron. ;) Worse than a moreyoxon, even. But I don't see how matching the behavior of the hardware increases the probability of coding errors. [even if we write "mov eax, ds"...] > > It' not likely I'd ever need it, but if I did it's likely I would expect > 'mov ax, ds' in BITS32 to execute exactly like BITS16 - leaving upper bits > alone... Agreed. We *need* the override here, no question. (when we need the instruction at all in 32-bit code...) But if an assembler were to screw up an instruction, this'd be a fairly painless one. > As for the cpu dependent version, it's possible someone could be > using that "defect" in a cpu detection routine. What's the "CPU dependent version"? "mov eax, ds" (no override, in 32-bit code)? Okay, you got it. (I'm not sure this *does* ever do anything but zero the high bits, but feel free to check.) > I guess I believe in "Do no harm." when it comes to assemblers. That's why we should *never* emit the useless, redundant, cache-filling override when segreg is the *destination*. This one really *is* "always 16 bits". Might as well emit a "nop". Again, it'd be a painless one to get wrong. Best, Frank
Post Follow-up to this messageOn Mar 21, 2:38_am, Frank Kotler <spamt...@crayne.org> wrote: > > That's why we should *never* emit the useless, redundant, cache-filling > override when segreg is the *destination*. This one really *is* "always > 16 bits". Might as well emit a "nop". Again, it'd be a painless one to > get wrong. A quick follow-up. If the documentation claims that the H.O. bits of EAX are undefined after "mov eax,ds" (as opposed to being zero or left unchanged), why bother emitting the prefix? There are other instructions in the x86 instruction where Intel does this. hLater, Randy Hyde
Post Follow-up to this message"rhyde@cs.ucr.edu" <spamtrap@crayne.org> wrote in message news:73f814ec-3818-4b99-8870-62b9364205f4@e10g2000prf.googlegroups.com... > On Mar 21, 2:38 am, Frank Kotler <spamt...@crayne.org> wrote: > > > A quick follow-up. > If the documentation claims that the H.O. bits of EAX are undefined > after "mov eax,ds" (as opposed to being zero or left unchanged), why > bother emitting the prefix? Read my first response to Frank. RP
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.