For Programmers: Free Programming Magazines  


Home > Archive > A86 Assembler > January 2005 > File I/O









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author File I/O
amitkr

2005-01-12, 8:55 pm

Hi, I am a noobie so any help will be much appreciated...

I am trying to read a file and display the bytes read...

I am able to do so... but the output is in ASCII but I want the result to
be displayed as HEX values....


How to do this... please help

wolfgang kern

2005-01-13, 8:57 am


Jack Klein wrote:

[..Ascii2hex]
| ; character to hex
| ; called with the character to be converted in al
| ; returns high ASCII hex character in DH, low
| ; ASCII hex character in DL
|
| byte_to_hex:
| push ax
| call nibble_to_hex
| mov dl, al
| pop ax
| shr al, 1
| shr al, 1
| shr al, 1
| shr al, 1
| call nibble_to_hex
| mov dh, al
| ret
|
| nibble_to_hex:
| and al, 0fh
| add al, 90h
| daa
| adc al, 40h
| daa
| ret
|
| Works on every single x86 processor from the 8088/8086 on up. Fastest
| on the 8088. From the 486 on up, a look-up table is faster than the
| add/daa/adc/daa sequence.

Yes, lookup table solution is the fastest for larger than one byte values
or if already cached ahead, expect many clock-cycles penalty (~35 on K7)
for a 'new' data-fetch. DAA -method is shortest but very slow.

I found this as a good compromise in terms of size and speed:
(26 bytes, 10..12 clock-cycles w/o call/return, ie: as a macro)

[macro: ascii2hex ;al -> ax]
mov ah,al
shr al,4 ; 4 times '1' if <80186
cmp al,0ah ;and al,0fh is redundant after the shift
jc $+2 ;skip next
add al,07h
add al,30h
xchg al,ah ;ah holds high nibble character yet
and al,0fh
cmp al,0ah
jc $+2 ;skip next
add al,07h
add al,30h ;al holds low nibble character
[/macro]
__
wolfgang


amitkr

2005-01-14, 8:55 pm

Hey why you use

shr al, 1

4 times instead of shr al,4

I belive both will give the same result... right

Please clarify
Thanx


Jack Klein

2005-01-15, 8:55 am

On Mon, 10 Jan 2005 18:33:17 +0000 (UTC), "amitkr"
<spamtrap@crayne.org> wrote in comp.lang.asm.x86:

> Hi, I am a noobie so any help will be much appreciated...
>
> I am trying to read a file and display the bytes read...
>
> I am able to do so... but the output is in ASCII but I want the result to
> be displayed as HEX values....
>
>
> How to do this... please help


; character to hex
; called with the character to be converted in al
; returns high ASCII hex character in DH, low
; ASCII hex character in DL

byte_to_hex:
push ax
call nibble_to_hex
mov dl, al
pop ax
shr al, 1
shr al, 1
shr al, 1
shr al, 1
call nibble_to_hex
mov dh, al
ret

nibble_to_hex:
and al, 0fh
add al, 90h
daa
adc al, 40h
daa
ret

Works on every single x86 processor from the 8088/8086 on up. Fastest
on the 8088. From the 486 on up, a look-up table is faster than the
add/daa/adc/daa sequence.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~.../FAQ-acllc.html

Matt

2005-01-15, 8:55 am

"amitkr" <spamtrap@crayne.org> wrote in message
news:4fcda74c6701931189c3e0a36ef1a59e@lo
calhost.talkaboutprogramming.com...
> Hey why you use
>
> shr al, 1
>
> 4 times instead of shr al,4


So that it will run on the 8086 and 8088 which could only shift by 1 place.

> I belive both will give the same result... right


Yes. Using shr al, 4 will be 4 times faster, too.

-Matt

wolfgang kern

2005-01-15, 8:55 am


Jack Klein wrote:

[..Ascii2hex]
| ; character to hex
| ; called with the character to be converted in al
| ; returns high ASCII hex character in DH, low
| ; ASCII hex character in DL
|
| byte_to_hex:
| push ax
| call nibble_to_hex
| mov dl, al
| pop ax
| shr al, 1
| shr al, 1
| shr al, 1
| shr al, 1
| call nibble_to_hex
| mov dh, al
| ret
|
| nibble_to_hex:
| and al, 0fh
| add al, 90h
| daa
| adc al, 40h
| daa
| ret
|
| Works on every single x86 processor from the 8088/8086 on up. Fastest
| on the 8088. From the 486 on up, a look-up table is faster than the
| add/daa/adc/daa sequence.

Yes, lookup table solution is the fastest for larger than one byte values
or if already cached ahead, expect many clock-cycles penalty (~35 on K7)
for a 'new' data-fetch. DAA -method is shortest but very slow.

I found this as a good compromise in terms of size and speed:
(26 bytes, 10..12 clock-cycles w/o call/return, ie: as a macro)

[macro: ascii2hex ;al -> ax]
mov ah,al
shr al,4 ; 4 times '1' if <80186
cmp al,0ah ;and al,0fh is redundant after the shift
jc $+2 ;skip next
add al,07h
add al,30h
xchg al,ah ;ah holds high nibble character yet
and al,0fh
cmp al,0ah
jc $+2 ;skip next
add al,07h
add al,30h ;al holds low nibble character
[/macro]
__
wolfgang


spamtrap@crayne.org

2005-01-15, 8:55 am

On Tue, 11 Jan 2005 06:59:47 +0000 (UTC), "amitkr"
<spamtrap@crayne.org> wrote:

>Hey why you use
>
>shr al, 1
>
>4 times instead of shr al,4
>
>I belive both will give the same result... right


Not on an 8088 or 8086

Later processors, yes.

>From the OP's message:
>Works on every single x86 processor from the 8088/8086 on up.


--
Arargh501 at [drop the 'http://www.' from ->] http://www.arargh.com
BCET Basic Compiler Page: http://www.arargh.com/basic/index.html

To reply by email, remove the garbage from the reply address.

Frank Kotler

2005-01-15, 8:55 am

wolfgang kern wrote:

> DAA -method is shortest but very slow.


; nibble masked off
cmp al, 0Ah
sbb al, 69h
das

shorter? possibly even slower?

Best,
Frank

Jack Klein

2005-01-19, 3:56 am

On Tue, 11 Jan 2005 16:11:51 +0000 (UTC), "wolfgang kern"
<nowhere@nevernet.at> wrote in comp.lang.asm.x86:

>
> Jack Klein wrote:
>
> [..Ascii2hex]
> | ; character to hex
> | ; called with the character to be converted in al
> | ; returns high ASCII hex character in DH, low
> | ; ASCII hex character in DL
> |
> | byte_to_hex:
> | push ax
> | call nibble_to_hex
> | mov dl, al
> | pop ax
> | shr al, 1
> | shr al, 1
> | shr al, 1
> | shr al, 1
> | call nibble_to_hex
> | mov dh, al
> | ret
> |
> | nibble_to_hex:
> | and al, 0fh
> | add al, 90h
> | daa
> | adc al, 40h
> | daa
> | ret
> |
> | Works on every single x86 processor from the 8088/8086 on up. Fastest
> | on the 8088. From the 486 on up, a look-up table is faster than the
> | add/daa/adc/daa sequence.
>
> Yes, lookup table solution is the fastest for larger than one byte values
> or if already cached ahead, expect many clock-cycles penalty (~35 on K7)
> for a 'new' data-fetch. DAA -method is shortest but very slow.


Translated this code from source for various 8-bitters before there
was an x86. It worked on 8080, 8085, Z80, 6809, almost everything
with a half carry flag and a DAA or equivalent instruction.

Frank Kotter posted a routine with fewer instructions using DAS that
looks faster. I had code that had worked on half a dozen different
8-bit processors and never thought of that. As for being very slow, I
don't think anything other than Frank's code would be faster on an
8088, and there were an awful lot of 8088's out there 20 to 25 years
ago.

> I found this as a good compromise in terms of size and speed:
> (26 bytes, 10..12 clock-cycles w/o call/return, ie: as a macro)


I guarantee you, not 10 to 12 clock cycles on an 8088! Probably not
even on a Pentium anything, if you take either of those two jumps.
When I did a 486 project (32 bit flat mode) I discovered that a
look-up table was just about as fast as DAA, on an original Pentium,
faster.

>
> [macro: ascii2hex ;al -> ax]
> mov ah,al
> shr al,4 ; 4 times '1' if <80186
> cmp al,0ah ;and al,0fh is redundant after the shift
> jc $+2 ;skip next
> add al,07h
> add al,30h
> xchg al,ah ;ah holds high nibble character yet
> and al,0fh
> cmp al,0ah
> jc $+2 ;skip next
> add al,07h
> add al,30h ;al holds low nibble character
> [/macro]
> __
> wolfgang


--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~.../FAQ-acllc.html

wolfgang kern

2005-01-19, 4:07 pm


Jack Klein wrote:

[about..Ascii2hex]
| > Yes, lookup table solution is the fastest for larger than one byte values
| > or if already cached ahead, expect many clock-cycles penalty (~35 on K7)
| > for a 'new' data-fetch. DAA -method is shortest but very slow.

| Translated this code from source for various 8-bitters before there
| was an x86. It worked on 8080, 8085, Z80, 6809, almost everything
| with a half carry flag and a DAA or equivalent instruction.

| Frank Kotler posted a routine with fewer instructions using DAS that
| looks faster. I had code that had worked on half a dozen different
| 8-bit processors and never thought of that. As for being very slow, I
| don't think anything other than Frank's code would be faster on an
| 8088, and there were an awful lot of 8088's out there 20 to 25 years
| ago.

Sure, I just think 8088 become very rare used this days.

| > I found this as a good compromise in terms of size and speed:
| > (26 bytes, 10..12 clock-cycles w/o call/return, ie: as a macro)
[..]
| I guarantee you, not 10 to 12 clock cycles on an 8088! Probably not
| even on a Pentium anything, if you take either of those two jumps.
| When I did a 486 project (32 bit flat mode) I discovered that a
| look-up table was just about as fast as DAA, on an original Pentium,
| faster.

My timing notes are measured on K7 (P4 may show similar timing here),
with the code already cached and aligned for not crossing L1-cache bounds,
so these two jumps wont cause a new code-fetch.


__
wolfgang


wolfgang kern

2005-01-23, 8:55 am


Jack Klein wrote:

[about..Ascii2hex]
| > Yes, lookup table solution is the fastest for larger than one byte values
| > or if already cached ahead, expect many clock-cycles penalty (~35 on K7)
| > for a 'new' data-fetch. DAA -method is shortest but very slow.

| Translated this code from source for various 8-bitters before there
| was an x86. It worked on 8080, 8085, Z80, 6809, almost everything
| with a half carry flag and a DAA or equivalent instruction.

| Frank Kotler posted a routine with fewer instructions using DAS that
| looks faster. I had code that had worked on half a dozen different
| 8-bit processors and never thought of that. As for being very slow, I
| don't think anything other than Frank's code would be faster on an
| 8088, and there were an awful lot of 8088's out there 20 to 25 years
| ago.

Sure, I just think 8088 become very rare used this days.

| > I found this as a good compromise in terms of size and speed:
| > (26 bytes, 10..12 clock-cycles w/o call/return, ie: as a macro)
[..]
| I guarantee you, not 10 to 12 clock cycles on an 8088! Probably not
| even on a Pentium anything, if you take either of those two jumps.
| When I did a 486 project (32 bit flat mode) I discovered that a
| look-up table was just about as fast as DAA, on an original Pentium,
| faster.

My timing notes are measured on K7 (P4 may show similar timing here),
with the code already cached and aligned for not crossing L1-cache bounds,
so these two jumps wont cause a new code-fetch.


__
wolfgang


Michael Brown

2005-01-31, 3:56 pm

Frank Kotler wrote:
> wolfgang kern wrote:
>
>
> ; nibble masked off
> cmp al, 0Ah
> sbb al, 69h
> das
>
> shorter? possibly even slower?


Continuing the "funky tricks with sbb" theme that seems to be running this
month:

mov ah,al
and al,0x0F
shr ah,4

cmp ah,10
sbb bh,bh
cmp al,10
sbb bl,bl

and bx,0x0707
add ax,0x3737
sub ax,bx

9 cycles all up (in 32-bit mode) on a K7 (Wolfgang's runs at 12.3
cycles/byte by my very primitive benchmarking proc). And sneaks in at a very
chy 25 bytes :) Obviously, the code above can be extended to 32-bit (or
64-bit) GPRs as well, or MMX/SSE2 if you're that way inclined.

--
Michael Brown
www.emboss.co.nz : OOS/RSI software and more :)
Add michael@ to emboss.co.nz ---+--- My inbox is always open

wolfgang kern

2005-01-31, 8:55 pm


Michael Brown wrote:

[..]
| Continuing the "funky tricks with sbb" theme that seems to be running this
| month:
|
| mov ah,al
| and al,0x0F
| shr ah,4
|
| cmp ah,10
| sbb bh,bh
| cmp al,10
| sbb bl,bl
|
| and bx,0x0707
| add ax,0x3737
| sub ax,bx
|
| 9 cycles all up (in 32-bit mode) on a K7 (Wolfgang's runs at 12.3
| cycles/byte by my very primitive benchmarking proc). And sneaks in at a very
| chy 25 bytes :) Obviously, the code above can be extended to 32-bit (or
| 64-bit) GPRs as well, or MMX/SSE2 if you're that way inclined.

Right, my old disassembler already use a similar '4-nibbles at once' version.
The new version (HEXEDIT64) will also cover all 64-bit expressions,even I
found only two (B8..F and A0..3) beside the 'physical address view' yet.

I'm not sure to use mmx/xmm, as I find GPR-usage much faster.

__
wolfgang


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com