For Programmers: Free Programming Magazines  


Home > Archive > AWK > January 2006 > how should printf "%d",x behave when x is a very large value?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author how should printf "%d",x behave when x is a very large value?
Andrew Schorr

2006-01-11, 6:57 pm

This is another tricky issue that has come up on bug.gnu.utils
recently.
The question is, given the following script:

awk -v n=100 'BEGIN {printf "%d\n",2^n}'

what should the output be for different values of n? Suppose,
for example, that n is 100. Should it print
1267650600228229401496703205376, or should
it print a floating-point approximation? Clearly, for
small n, it should print the exact value. And one might
argue that for huge n, it no longer makes sense to
print it as an integer. But my question is where the
breakpoint should be. One possibility is to handle
this the same way as if "%.0f" were used. In that
case, it would always print as an integer. Another
argument might be to change behavior once the
value exceeds the maximum integer resolution
of the IEEE floating-point representation (i.e. somewhere
around 2^53).

Different implementations seem to vary on how they
handle this. Some just print 2147483647 for any
number larger than that value.

Thoughts?

Regards,
Andy

Andrew Schorr

2006-01-11, 6:57 pm

I should add that another logical approach would be to change
behavior once the value exceeds the maximum value representable
in an integer type on the given platform (typically 2^32 or 2^64).

The question in my mind is what would be the most logical,
consistent, and expected behavior.

Regards,
Andy

John DuBois

2006-01-11, 6:57 pm

In article <1137007592.506892.255950@g44g2000cwa.googlegroups.com>,
Andrew Schorr <aschorr@telemetry-investments.com> wrote:
>
>The question in my mind is what would be the most logical,
>consistent, and expected behavior.


If I weren't used to gawk's behavior, I would expect that %d would always
produce output consisting exclusively of '-' and digits (as in practice you get
with a large .precision).

John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
Harlan Grove

2006-01-11, 6:57 pm

Andrew Schorr wrote...
>This is another tricky issue that has come up on bug.gnu.utils
>recently.
>The question is, given the following script:
>
> awk -v n=100 'BEGIN {printf "%d\n",2^n}'
>
>what should the output be for different values of n? Suppose,
>for example, that n is 100. Should it print
>1267650600228229401496703205376, or should
>it print a floating-point approximation? Clearly, for
>small n, it should print the exact value. And one might
>argue that for huge n, it no longer makes sense to
>print it as an integer. But my question is where the
>breakpoint should be. . . .

....

Since awk provides (IEEE) double precision floating point, and since
2^100 falls within the double precision range, if awk's printf's %d is
meant to be an extension of C's printf's %d, so that values just
outside the range of long integers are printed in full precision, then
any exactly representable integer value should be. On the other hand,
if you're going to impose an arbitrary cut-off, might as well use the
long integer range.

Andrew Schorr

2006-01-12, 6:56 pm

So it sounds like you both would advocate treating "%d" as essentially
equivalent to "%.0f"?

Regards,
Andy

John DuBois

2006-01-12, 6:56 pm

In article <1137078550.345793.302110@f14g2000cwb.googlegroups.com>,
Andrew Schorr <aschorr@telemetry-investments.com> wrote:
>So it sounds like you both would advocate treating "%d" as essentially
>equivalent to "%.0f"?


For my part - yes.

John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
Don Stokes

2006-01-12, 6:56 pm

In article <1137004830.351128.193260@f14g2000cwb.googlegroups.com>,
Andrew Schorr <aschorr@telemetry-investments.com> wrote:
>This is another tricky issue that has come up on bug.gnu.utils
>recently.
>The question is, given the following script:
>
> awk -v n=100 'BEGIN {printf "%d\n",2^n}'
>
>what should the output be for different values of n? Suppose,
>for example, that n is 100. Should it print
>1267650600228229401496703205376, or should
>it print a floating-point approximation? Clearly, for
>small n, it should print the exact value. And one might
>argue that for huge n, it no longer makes sense to
>print it as an integer. But my question is where the
>breakpoint should be. One possibility is to handle


Hmmm:

[don@bsd ~]$ gawk 'BEGIN { printf "%d\n", 2^63 }'
9223372036854775808
[don@bsd ~]$ gawk 'BEGIN { printf "%d\n", 2^64 }'
0
[don@bsd ~]$ gawk 'BEGIN { printf "%d\n", 2^65 }'
3.68935e+19

[don@bsd ~]$ gawk 'BEGIN { printf "%.0f\n", 2^63 }'
9223372036854775808
[don@bsd ~]$ gawk 'BEGIN { printf "%.0f\n", 2^64 }'
18446744073709551616
[don@bsd ~]$ gawk 'BEGIN { printf "%.0f\n", 2^65 }'
36893488147419103232

So I guess the simple answer is to use "%.0f" to print large integers,
bearing in mind that 2^64 is somewhat outside the accuracy of a 64 bit
floating point number ...

-- don
Harlan Grove

2006-01-12, 6:56 pm

Don Stokes wrote...
....
>So I guess the simple answer is to use "%.0f" to print large integers,
>bearing in mind that 2^64 is somewhat outside the accuracy of a 64 bit
>floating point number ...


2^64 is exactly representable in IEEE double precision floating point.
All sums of 52 or fewer adjacent powers of 2 between 2^-1023 and 2^1023
are exactly representable. Numbers like 2^64 - 1 (= 2^64 + 2^0) aren't.

Andrew Schorr

2006-01-13, 6:56 pm


Don Stokes wrote:
> [don@bsd ~]$ gawk 'BEGIN { printf "%d\n", 2^64 }'
> 0


This is a known gawk bug (actually, that's how this whole thread of
discussion
got started). There is a patch available; let me know if interested.

Regards,
Andy

Marek Simon

2006-01-25, 6:56 pm

Awk manual says, that all numbers are internaly stored as double
floating point numbers. Then it says the printf function works exactly
as C printf function. So I think, "%d" converts value to long int (max
value is 2^63-1) and if it is outside the range, it prints it as a float.
Marek


Andrew Schorr wrote:
> This is another tricky issue that has come up on bug.gnu.utils
> recently.
> The question is, given the following script:
>
> awk -v n=100 'BEGIN {printf "%d\n",2^n}'
>
> what should the output be for different values of n? Suppose,
> for example, that n is 100. Should it print
> 1267650600228229401496703205376, or should
> it print a floating-point approximation? Clearly, for
> small n, it should print the exact value. And one might
> argue that for huge n, it no longer makes sense to
> print it as an integer. But my question is where the
> breakpoint should be. One possibility is to handle
> this the same way as if "%.0f" were used. In that
> case, it would always print as an integer. Another
> argument might be to change behavior once the
> value exceeds the maximum integer resolution
> of the IEEE floating-point representation (i.e. somewhere
> around 2^53).
>
> Different implementations seem to vary on how they
> handle this. Some just print 2147483647 for any
> number larger than that value.
>
> Thoughts?
>
> Regards,
> Andy
>

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com