Home > Archive > Fortran > January 2008 > REAL -> string -> REAL
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
REAL -> string -> REAL
|
|
| Gus Gassmann 2008-01-22, 8:22 am |
| Having all sorts of problems in C/C++ with the conversion in the
subject line, I thought to try my luck in the Fortran forum.
The situation is this: I have a (large number of) floating point
number(s), let's assume in double precision --- this might be subject
to change --- that I have to write to a file in ASCII format (an XML
file, if that makes a difference) and subsequently read back in (into
the same type of real that I used before; I'll make sure of this
somehow).
Are there any tools available in Fortran that will allow me to do the
conversion from REAL (assume binary) to string (assume decimal) and
back to REAL _without_loss_of_precision_? (That is, I must be
guaranteed that the two versions of the real number have the same
internal representation. Side condition: I'd like to use the smallest
string representation that will give this guarantee.
There is a paper by David Gay (c. 1990) in which he describes the
algorithm, but I have not been able to locate the C-code that goes
with it. What is the situation in the Fortran world? Do I have to
write my own implementation from scratch?
Thanks for any replies.
| |
|
| > There is a paper by David Gay (c. 1990) in which he describes the
> algorithm, but I have not been able to locate the C-code that goes with
> it. What is the situation in the Fortran world? Do I have to write my
> own implementation from scratch?
Most compilers should do what you describe (ie lossless I/O) when you use
the default format:
write (unit,*) my_real
even though I'm not too sure this is actually required by the Standard.
For the "minimum width" requirement, a few compiler do that (the other
ones prefer a fixed-width default format), including g95 (which, IIRC,
actually uses D. Gay's algorithm).
As for C code implementing the algorithm you mention, there is a version
available here: http://www.netlib.org/fp/
--
FX
| |
| Steve Lionel 2008-01-22, 7:27 pm |
| On Jan 22, 8:24 am, "FX" <coud...@alussinan.org> wrote:
> Most compilers should do what you describe (ie lossless I/O) when you use
> the default format:
>
> write (unit,*) my_real
>
> even though I'm not too sure this is actually required by the Standard.
It's not required and you can't rely on this. I'd be astonished if it
was actually the case in any implementation.
If I recall from IEEE 754, one needs at least three additional decimal
digits past the stated precision to guarantee lossless conversion to
and from decimal. I may be misremembering the details. List-directed
output may give you one digit extra, if that - the number of digits is
implementation-dependent.
I think you'd be safe to write out the values in an E format with
three or four more digits in the fraction past what the datatype
provides. For example, E17.10E2 for single-precision and E27.19E3 for
double precision. This assumes of course that the implementation
converts such values precisely.
Steve
| |
|
| > It's not required and you can't rely on this. I'd be astonished if it
> was actually the case in any implementation.
Well, it is the case for gfortran and g95, at least. (For gfortran, some
system C libraries have trouble correctly dealing with denormals, but
apart from that, it is working fine.) I am a bit if other
implementations don't take care of that; I think the default format
should be one of lossless transmission of information.
In gfortran, the default formats are fixed-width and correspond to:
- 1PG15.8E2 for REAL(4)
- 1PG25.17E3 for REAL(8)
- 1PG29.20E4 for REAL(10)
- 1PG44.35E4 for REAL(16)
I think they're safe.
--
FX
| |
| James Van Buskirk 2008-01-22, 7:27 pm |
| "Steve Lionel" <steve.lionel@intel.com> wrote in message
news:f8e3dc49-5c10-4249-bad6-2d359abd856d@v4g2000hsf.googlegroups.com...
> On Jan 22, 8:24 am, "FX" <coud...@alussinan.org> wrote:
[color=darkred]
[color=darkred]
[color=darkred]
> It's not required and you can't rely on this. I'd be astonished if it
> was actually the case in any implementation.
When I require exactly reproducible results as I do for every post :)
I write out more digits than I need and start deleting, reading them
back in to see where a discrepancy arises. That works provided the
same processor is going to read it next time, but it seems to me that
a different processor may use a slightly different algorithm to read
so I also check adjacent possible outputs to attempt to detect
borderline cases and include an extra digit if so. Works great by
hand, but I haven't automated it.
P.S. Steve, I looked at your forum today:
C:\gfortran\clf\change_rank>type change_rank.f90
program change_rank
use ISO_C_BINDING
implicit none
real, target :: x3(3,4)
real, pointer :: x1(:)
integer, pointer :: i1(:)
type(C_PTR) intermediate
character(80) fmt
x3 = reshape([ 1.0, 1.1, 1.2, 1.3, &
2.0, 2.1, 2.2, 2.3, &
3.0, 3.1, 5.7, 3.3], &
shape(x3), order = [2,1])
intermediate = C_LOC(x3)
write(fmt,'(a,i0,a)') '(',size(x3,2),'(f3.1:1x))'
write(*,'(a)') 'Original matrix ='
write(*,fmt) transpose(x3)
call C_F_POINTER(intermediate, x1, [size(x3)])
write(fmt,'(a,i0,a)') '(',size(x1,1),'(f3.1:1x))'
write(*,'(/a)') 'Rank-1 matrix ='
write(*,fmt) x1
call C_F_POINTER(intermediate, i1, [size(x3)])
write(fmt,'(a,i0,a)') '(',size(i1,1)/2,'(z8.8:1x))'
write(*,'(/a)') 'Bits of matrix ='
write(*,fmt) i1
end program change_rank
C:\gfortran\clf\change_rank>C:\gfortran\win64\bin\x86_64-pc-mingw32-gfortran
cha
nge_rank.f90 -ochange_rank
C:\gfortran\clf\change_rank>change_rank
Original matrix =
1.0 1.1 1.2 1.3
2.0 2.1 2.2 2.3
3.0 3.1 5.7 3.3
Rank-1 matrix =
1.0 2.0 3.0 1.1 2.1 3.1 1.2 2.2 5.7 1.3 2.3 3.3
Bits of matrix =
3F800000 40000000 40400000 3F8CCCCD 40066666 40466666
3F99999A 400CCCCD 40B66666 3FA66666 40133333 40533333
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
| |
| Richard Maine 2008-01-22, 7:27 pm |
| FX <coudert@alussinan.org> wrote:
>
> Most compilers should do what you describe (ie lossless I/O) when you use
> the default format:
>
> write (unit,*) my_real
>
> even though I'm not too sure this is actually required by the Standard.
This is definitely not required by the standard. I might even argue that
the language of the standard hints more against than for this. The
standard requires that the field widths be selected to be "reasonable"
(or some such simillar term). If the standard wanted to specify lossless
conversion, it could have done that explicitly enough. Certainly
lossless conversion would be allowed as one definition of "reasonable",
but it is *NOT* required. My personal definition of "reasonable" would
be quite different, for example.
I most vehemently recommend against people depending on this one. That's
not what list-directed output is for. This is just variant number 42 of
the same fundamental problem with misuse of list-directed output.
List-directed output is for fast (to the programmer) and simple output
when you are not picky about the details. If you are picky about the
details, then don't use list-directed output. Those who ignore this
advice (or haven't heard or deduced it) are regularly comming here for
help when their expectations are not met.
Please do not encourage further people to make this mistake.
That some particular assumption about list-directed output might happen
to be true on some particular compilers isn't good enough.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
| |
| glen herrmannsfeldt 2008-01-22, 7:27 pm |
| Steve Lionel wrote:
> On Jan 22, 8:24 am, "FX" <coud...@alussinan.org> wrote:
[color=darkred]
[color=darkred]
[color=darkred]
> It's not required and you can't rely on this. I'd be astonished if it
> was actually the case in any implementation.
Java requires it, and I hate it. Well, it wouldn't be so bad
if they provided a convenient alternative.
It is nice to have simple values come out with a simple representation,
especially for quick testing.
It might make some sense for NAMELIST output, which is rarely used
otherwise.
-- glen
| |
| glen herrmannsfeldt 2008-01-22, 7:27 pm |
| Gus Gassmann wrote:
> Having all sorts of problems in C/C++ with the conversion in the
> subject line, I thought to try my luck in the Fortran forum.
> The situation is this: I have a (large number of) floating point
> number(s), let's assume in double precision --- this might be subject
> to change --- that I have to write to a file in ASCII format (an XML
> file, if that makes a difference) and subsequently read back in (into
> the same type of real that I used before; I'll make sure of this
> somehow).
Traditionally when you wanted to write out data and read in the
exact same data you used UNFORMATTED I/O.
For many years of Fortran there were many different floating point
representations that it didn't make sense to ask for same value
back again on a different system. Over the years, the default REAL
has varied from 36 bits, to 32, to 60, to 64, on different systems.
(Probably some others in there, too.)
> Are there any tools available in Fortran that will allow me to do the
> conversion from REAL (assume binary) to string (assume decimal) and
> back to REAL _without_loss_of_precision_? (That is, I must be
> guaranteed that the two versions of the real number have the same
> internal representation. Side condition: I'd like to use the smallest
> string representation that will give this guarantee.
You could do it in a loop, using internal I/O, write out the
value with different format widths in E format, and do binary
search to find the appropriate width. That may only guarantee it
using the same library, though. If I were doing this, I might
try for a base 16 system.
-- glen
| |
|
| > List-directed output is for fast (to the programmer) and simple output
> when you are not picky about the details.
I was mistaken. It seems rather awkard, then, than there is no simple (or
even moderately advanced) way of specifying lossless formatted I/O.
Writing with a large enough format requires knowldge of the number model
of the processor, and thus isn't portable, and James' aumated procedure
(start with large formats, and go to smaller formats while making sure
you can still read it back) is painful to code and certainly not very
efficient!
--
FX
| |
| James Giles 2008-01-22, 7:27 pm |
| Using formatted I/O should work, but not usually list directed I/O
(the one where * is the format). I've posted the appropriate formula
on this forum before. If your binary representation has B bits in its
significand, the required number of significant decimal digits D is
given by:
10^(D-1) > 2^B
Privided D satisfies this relation, convertion to decimal and then
back to binary yields the identical values (assuming your implementation's
conversion is done correctly). For IEEE single precision, where
B is 24, D must be at least 9. For IEEE double precision, where
B is 53, D must be at least 17. For IEEE double extended (the 80
bit format of Intel chips is an example of this), where B is 64,
D must be at least 21.
So a format specification of E16.9 should suffice for IEEE single.
For IEEE double you should use E25.17e3. And for double extended,
the appropriate format is E30.21e4. Note that there's no difference
between E and D edit descriptors (though the latter may put a D before
the exponent part on output). The fact that there are two edit descriptors
is a holdover from pre-f77 days.
Now all this may not work. There's no guarantee (especially for the
double extended form) that the implementation's run-time I/O library
will convert as accurately as needed. These days it should, but it's
not guaranteed. If the I/O doesn't work, you'll probably have to
some multiprecision package and do the conversion yourself.
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
| |
| Richard Maine 2008-01-22, 7:27 pm |
| FX <coudert@alussinan.org> wrote:
> It seems rather awkard, then, than there is no simple (or
> even moderately advanced) way of specifying lossless formatted I/O.
> Writing with a large enough format requires knowldge of the number model
> of the processor, and thus isn't portable,
The concept of "lossless" is inherently nonportable, insomuch as the
target processor might not be able to represent the same set of values
at all.
For cases where the concept makes sense, algorithms exist. The OP even
references one. While I haven't personally tried that algorithm, it
sounds to me as though it belies the claim that there isn't a
"moderately advanced" way to do it.
As others have noted, another "moderately advanced" way would be to use
a hex encoding, which would avoid the conversion to decimal. While a
slight bother, there is no "rocket science" about that. That approach
gives something not easy for the human to read (it isn't obvious whether
or noty that is a requirement - might be) and it presumes an underlying
binary representation, but... see above about the concept being
inherently nonportable.
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
| |
| Dan Nagle 2008-01-22, 7:27 pm |
| Hello,
On 2008-01-22 14:50:15 -0500, "FX" <coudert@alussinan.org> said:
> I was mistaken. It seems rather awkard, then, than there is no simple (or
> even moderately advanced) way of specifying lossless formatted I/O.
You can try using a BOZ format descriptor with a real list item.
It's an extension, most likely will be in f08, but if it's supported
by your compiler, it will allow exact bit pattern copying.
--
--
Cheers!
Dan Nagle
| |
|
| > The concept of "lossless" is inherently nonportable, insomuch as the
> target processor might not be able to represent the same set of values
> at all.
Agreed. I mean "lossless throughout similar number models".
> As others have noted, another "moderately advanced" way would be to use
> a hex encoding
And I should have added "humand readable", which was in my mind but I did
not write. Talk about posting before you proofread!
> For cases where the concept makes sense, algorithms exist. The OP even
> references one. While I haven't personally tried that algorithm, it
> sounds to me as though it belies the claim that there isn't a
> "moderately advanced" way to do it.
I would think that anyone who has thought about implementing D. Gray's
I/O conversion code in a portable way would definitely rate it beyond
moderately advance. But it may just be my programming skills in that
field are poor.
Well, anyway, I would have liked to have such a feature in the language,
but it's not the case and if people feel like it's not useful, I
shouldn't keep arguing the point...
--
FX
| |
| glen herrmannsfeldt 2008-01-22, 7:27 pm |
| FX wrote:
> I was mistaken. It seems rather awkard, then, than there is no simple (or
> even moderately advanced) way of specifying lossless formatted I/O.
Starting with Fortran I, UNFORMATTED I/O would be used when
lossless I/O was needed. It is faster and takes up less space.
Within the Fortran standard it doesn't make sense to specify
lossless. (That is, between implementations.) Different
implementations use different values for base, exponent range,
and the number of significant digits (in the specified base).
Note that Java, which requires IEEE-754 floating point, does
specify lossless conversion.
If you know it is IEEE standard, a hex representation of
the IEEE bit patterns would be reasonable. Otherwise,
a more general hex representation could be used.
-- glen
| |
| Richard Maine 2008-01-22, 7:27 pm |
| FX <coudert@alussinan.org> wrote:
>
> Agreed. I mean "lossless throughout similar number models".
Well, that would require some significant supporting material then if
you expected such a thing in the standard, because the standard doesn't
even define a suitable concept of "similar number models." The standard
precisely defines a concept *MODEL* numbers, but I emphasize the word
"model" because it is important. The model numbers most definitely are
not the same thing as actual representable numbers, which are what you
would need. Nowhere in the standard is there much at all in the way of
description of the actual representable numbers - certainly not in the
kind of detail that would be needed to form the basis of such a
requirement.
Note also that when the f77 standard was written (which was when
list-directed I/O was introduced to the standard), it was not
particularly common for there to be lots of different processors that
would have fit any such definition of similarity. There was wide variety
in word sizes. Even among processors of the same word size, the formats
were not simillar enough for "lossless" conversion to be a sensible
concept. IBM's 32-bit float didn't have the same precision as DEC's
32-bit ones, etc. Heck, DEC had 2 different 64-bit floats on the same
machine; I'd have to pull out the details to check, but I wouldn't count
on them being quite simillar enough for all cases to be converted
losslessly.
In general, if you were expecting the language standard to specify exact
results for conversion of floating point I/O, I think your expectations
are a bit out of sync with the rest of the language. Wouldn't it make
sense to first have specifications that such things as 2.0+2.0 be
computed exactly before requiring exact conversions between different
architectures? Which one of those seems more fundamental to you?
--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
| |
| glen herrmannsfeldt 2008-01-22, 7:27 pm |
| FX wrote:
(snip)
> And I should have added "humand readable", which was in my mind but I did
> not write. Talk about posting before you proofread!
Depending on your definition of human readable.
If you convert the nearest binary floating point representation
of 0.3d0 to a lossless decimal form it is likely to come out
something like 2.9999999999999998e-01, significantly less
human readable than 0.3. (If I have it right, that is IEEE
double 0.3 with 17 decimal digits.)
-- glen
| |
| James Giles 2008-01-22, 7:27 pm |
| glen herrmannsfeldt wrote:
> FX wrote:
> (snip)
>
>
> Depending on your definition of human readable.
>
> If you convert the nearest binary floating point representation
> of 0.3d0 to a lossless decimal form it is likely to come out
> something like 2.9999999999999998e-01, significantly less
> human readable than 0.3. (If I have it right, that is IEEE
> double 0.3 with 17 decimal digits.)
Depends on what you're trying to read. If you want to have some
idea of the actual number that will be used in the computation, the
longer form is more readable.
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
| |
| Tim Prince 2008-01-22, 7:27 pm |
| FX wrote:
>
> Well, it is the case for gfortran and g95, at least. (For gfortran, some
> system C libraries have trouble correctly dealing with denormals, but
> apart from that, it is working fine.) I am a bit if other
> implementations don't take care of that; I think the default format
> should be one of lossless transmission of information.
>
> In gfortran, the default formats are fixed-width and correspond to:
>
> - 1PG15.8E2 for REAL(4)
> - 1PG25.17E3 for REAL(8)
> - 1PG29.20E4 for REAL(10)
> - 1PG44.35E4 for REAL(16)
>
> I think they're safe.
>
In the papers published around the IEEE754 proposal, an exact formula
was given for the number of digits required to specify a unique binary
value of a given number of bits of precision. Admittedly, the
situations where 8 don't suffice for default real are rare, and 8 is
more suitable as a choice for gfortran default, but the number is 9. 17
is correct for double precision. The IEEE standard doesn't require
available conversion of digits beyond the required number.
I think I recall that the reversibility mentioned here can be guaranteed
only for a limited range, something like 0.1 <= abs(x) <= 1/EPSILON(x).
As I'm on a weak wireless connection at the airport, I'll postpone a
search for exact references until later, if asked. I'm somewhat
surprised that someone didn't already supply them.
| |
| James Giles 2008-01-22, 7:27 pm |
| Tim Prince wrote:
....
> I think I recall that the reversibility mentioned here can be
> guaranteed only for a limited range, something like 0.1 <= abs(x) <=
> 1/EPSILON(x). [...]
No, reversibility is still assured. But, outside a given range, the
requirement that the decimal representation be correctly rounded
is relaxed. It's still required to be within an ULP, but you might
not round the correct direstion. The requirements are chosen to
still guarantee that converting back to binary yields your original
number.
--
J. Giles
"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare
| |
| Gib Bogle 2008-01-23, 4:29 am |
| Gus Gassmann wrote:
> Having all sorts of problems in C/C++ with the conversion in the
> subject line, I thought to try my luck in the Fortran forum.
>
> The situation is this: I have a (large number of) floating point
> number(s), let's assume in double precision --- this might be subject
> to change --- that I have to write to a file in ASCII format (an XML
> file, if that makes a difference) and subsequently read back in (into
> the same type of real that I used before; I'll make sure of this
> somehow).
>
> Are there any tools available in Fortran that will allow me to do the
> conversion from REAL (assume binary) to string (assume decimal) and
> back to REAL _without_loss_of_precision_? (That is, I must be
> guaranteed that the two versions of the real number have the same
> internal representation. Side condition: I'd like to use the smallest
> string representation that will give this guarantee.
>
> There is a paper by David Gay (c. 1990) in which he describes the
> algorithm, but I have not been able to locate the C-code that goes
> with it. What is the situation in the Fortran world? Do I have to
> write my own implementation from scratch?
>
> Thanks for any replies.
Coincidentally, some people that I do coding for occasionally have a
problem of this sort. Their (C++) software does stress analysis of
structures described by finite elements. Sometimes they export geometry
to AutoCAD, make changes to it there, maybe adding substructures, then
import it back again. In this conversion process nodes sometimes have
their coordinates slightly changed, and their code thinks that two nodes
that should be at the same point (and treated as the same node) are two
distinct nodes. Needless to say all kinds of ugly consequences follow.
I haven't delved into the problem, but the work-around that I'm
proposing is to choose a level of granularity, some value dx that is
less than the resolution they care about, and less than the distance
between the closest pair of distinct nodes, then make sure that all
coordinates are adjusted to the nearest multiple of dx when they are
read in. Any obvious flaws in this plan?
|
|
|
|
|