Home > Archive > A86 Assembler > December 2004 > 16/32/64 bit
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
| Vannus 2004-12-21, 3:55 am |
| Does anyone know if its faster to do a MOV AX,[pointer] than a MOV
EAX,[pointer] or MOV RAX,[pointer]? Wouldnt it be quicker to move 16bits
around than 32 or 64? even though 32/64 is processors 'native' state?
Also, ive been reading about AMD64 long mode (64bit mode & copmatability
mode) but i havent found if your able to use 16bit registers while in
long mode. i would like to know if AL,AH,AX,EAX registers can still be
MOV'd and ADD'd while in long mode.
Thanks, Jim.
| |
|
| "Vannus" <spamtrap@crayne.org> wrote in message
news:cq86cn$n3t$1@news.freedom2surf.net...
> Does anyone know if its faster to do a MOV AX,[pointer] than a MOV
> EAX,[pointer] or MOV RAX,[pointer]? Wouldnt it be quicker to move 16bits
> around than 32 or 64? even though 32/64 is processors 'native' state?
Yes/no. It depends entirely on the alignment of the pointer. On an address
that is a multiple of 8, any post-Pentium CPU can load or store 8 bytes of
data in some fixed amount of time (assuming a cache hit). Loading or storing
a single byte is neither faster nor slower. This is why using rep movsd
instead of rep movsb is an important optimization when doing memory copies.
Of course, if your pointer isn't aligned, then it takes more than 1 cycle.
The CPU can only read 1-byte, 2-bytes, 4-bytes, or 8-bytes on an address
that is a multiple of 1 (every address), 2, 4, or 8 bytes respectively.
Though no x86 CPU with the exception of the Pentium-4 currently can
load/store more than 8 bytes in a single request, you can extrapolate this
further. On an x86 CPU, an unaligned load or store will be handled by
executing multiple smaller load/store operations.
So, to answer your question directly, it is entirely possible that they are
equally fast, but that is not *necessarily* so. It all depends on the
pointer value.
> Also, ive been reading about AMD64 long mode (64bit mode & copmatability
> mode) but i havent found if your able to use 16bit registers while in long
> mode. i would like to know if AL,AH,AX,EAX registers can still be MOV'd
> and ADD'd while in long mode.
16-bit registers are available in long mode as are the 8-bit registers with
one caveat. There are restrictions on when you can access ah, ch, dh, and
bh. Whenever a REX byte is present in an instruction, you can't access them.
REX bytes are prefixed to instructions in the following cases:
1) Accesses any of the new registers (r8-r15)
2) Accesses spl, bpl, sil, or dil
3) Accesses a 64-bit register
So, AFAIK you can't do movsx rax, ah, but you can do inc ah or add cl, ah.
Performance-wise it isn't a good idea to use ah, ch, dh, and bh.
As a general rule, any instruction that encodes in 32-bit mode will encode
in 64-bit mode except for a few complex instructions (e.g. pusha, popa) that
have been removed from the instruction set.
-Matt
| |
| Vannus 2004-12-22, 3:55 am |
|
Matt wrote:
> "Vannus" <spamtrap@crayne.org> wrote in message
> news:cq86cn$n3t$1@news.freedom2surf.net...
>
>
>
> Yes/no. It depends entirely on the alignment of the pointer. On an address
> that is a multiple of 8, any post-Pentium CPU can load or store 8 bytes of
> data in some fixed amount of time (assuming a cache hit). Loading or storing
> a single byte is neither faster nor slower. This is why using rep movsd
> instead of rep movsb is an important optimization when doing memory copies.
>
> Of course, if your pointer isn't aligned, then it takes more than 1 cycle.
> The CPU can only read 1-byte, 2-bytes, 4-bytes, or 8-bytes on an address
> that is a multiple of 1 (every address), 2, 4, or 8 bytes respectively.
> Though no x86 CPU with the exception of the Pentium-4 currently can
> load/store more than 8 bytes in a single request, you can extrapolate this
> further. On an x86 CPU, an unaligned load or store will be handled by
> executing multiple smaller load/store operations.
>
> So, to answer your question directly, it is entirely possible that they are
> equally fast, but that is not *necessarily* so. It all depends on the
> pointer value.
>
>
>
>
> 16-bit registers are available in long mode as are the 8-bit registers with
> one caveat. There are restrictions on when you can access ah, ch, dh, and
> bh. Whenever a REX byte is present in an instruction, you can't access them.
> REX bytes are prefixed to instructions in the following cases:
> 1) Accesses any of the new registers (r8-r15)
> 2) Accesses spl, bpl, sil, or dil
> 3) Accesses a 64-bit register
>
> So, AFAIK you can't do movsx rax, ah, but you can do inc ah or add cl, ah.
> Performance-wise it isn't a good idea to use ah, ch, dh, and bh.
>
> As a general rule, any instruction that encodes in 32-bit mode will encode
> in 64-bit mode except for a few complex instructions (e.g. pusha, popa) that
> have been removed from the instruction set.
>
> -Matt
>
Thanks, your answers have exactly what i was looking for :)
Jim.
| |
| Vannus 2004-12-28, 3:56 pm |
|
Matt wrote:
> "Vannus" <spamtrap@crayne.org> wrote in message
> news:cq86cn$n3t$1@news.freedom2surf.net...
>
>
>
> Yes/no. It depends entirely on the alignment of the pointer. On an address
> that is a multiple of 8, any post-Pentium CPU can load or store 8 bytes of
> data in some fixed amount of time (assuming a cache hit). Loading or storing
> a single byte is neither faster nor slower. This is why using rep movsd
> instead of rep movsb is an important optimization when doing memory copies.
>
> Of course, if your pointer isn't aligned, then it takes more than 1 cycle.
> The CPU can only read 1-byte, 2-bytes, 4-bytes, or 8-bytes on an address
> that is a multiple of 1 (every address), 2, 4, or 8 bytes respectively.
> Though no x86 CPU with the exception of the Pentium-4 currently can
> load/store more than 8 bytes in a single request, you can extrapolate this
> further. On an x86 CPU, an unaligned load or store will be handled by
> executing multiple smaller load/store operations.
>
> So, to answer your question directly, it is entirely possible that they are
> equally fast, but that is not *necessarily* so. It all depends on the
> pointer value.
>
>
>
>
> 16-bit registers are available in long mode as are the 8-bit registers with
> one caveat. There are restrictions on when you can access ah, ch, dh, and
> bh. Whenever a REX byte is present in an instruction, you can't access them.
> REX bytes are prefixed to instructions in the following cases:
> 1) Accesses any of the new registers (r8-r15)
> 2) Accesses spl, bpl, sil, or dil
> 3) Accesses a 64-bit register
>
> So, AFAIK you can't do movsx rax, ah, but you can do inc ah or add cl, ah.
> Performance-wise it isn't a good idea to use ah, ch, dh, and bh.
>
> As a general rule, any instruction that encodes in 32-bit mode will encode
> in 64-bit mode except for a few complex instructions (e.g. pusha, popa) that
> have been removed from the instruction set.
>
> -Matt
>
Thanks, your answers have exactly what i was looking for :)
Jim.
|
|
|
|
|