For Programmers: Free Programming Magazines  


Home > Archive > Fortran > October 2004 > weird I/O problem with g77









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author weird I/O problem with g77
Crabby Gator

2004-10-12, 3:58 pm

Greetings all,
I have come across a very strange problem whilst compiling a
Fortran program with the gcc/g++/g77 package.
Consider the following:

character*16 x
integer in, out

in = 10

write (x, '(''&'', a4)') in

read (x, '(1x, a4)') out

print *, 'in= ', in
print *, 'out=', out

Basically what the code is doing is using a character array as an
internal binary file to write values in to, which are later extracted.
Obviously there is an implicit assumption that an integer is four bytes
here but let's not worry about that right now. Sometimes the value is
integer, other times it is real. So despite the big/little endian nature
of the underlying works, I expect (and the original developer too) that in
the end (in == out). In fact, that is the behavior on Solaris, digital
fortran, and HP-UX.

HOWEVER,
The g77 compiler on both Cygwin and RH Linux systems produced the
following result:
in = 10
out = 32
I tried v3.3.3 and v3.4.1 of the gcc package with the same result.

I did further experimentation and discovered that the ONLY byte value
that suffers from this behavior is 10 (0xA). The same behavior happens
when the argument in question is a real value that happens to have one
of its byte values = 10. Upon read, a value of 32 is substituted for 10,
and the real is reconstructed - but you obviously get the wrong result!
Thus there is a healthy subset of both integer and real values that will
be mangled by the above code. Since the application that this came from is
a heavily numeric one, this behavior is unacceptable.

I have come up with a couple ways to work around this little problem, but
I'm wondering what the real problem is here. I checked the F77 standard
and nothing caught my eye regarding this. One person has suggested that
the behavior is simply undefined so milage my vary.

I read the thread on NEW_LINE() and some of that discussion seemed
relevant. In C/C++ on Win32 systems, you have to specify binary mode when
you open a external file for reading otherwise line-ending characters are
magicaly "converted" for you. This seems like something of the same ilk.
Could there be a bug in the underlying C code for g77 that isn't opening
this "file" in the proper mode and thus causing this "conversion" from
0x0A to 0x20 ??

The thoughts of the community are appreciated!

Crabby


John C. Bollinger

2004-10-12, 3:58 pm

Crabby Gator wrote:

> Greetings all,
> I have come across a very strange problem whilst compiling a
> Fortran program with the gcc/g++/g77 package.
> Consider the following:
>
> character*16 x
> integer in, out
>
> in = 10
>
> write (x, '(''&'', a4)') in


You are formatting an integer with a character edit descriptor. G77 is
unlikely to be the only compiler for which this causes odd / unexpected
behavior. The data item is likely being interpreted as Hollerith data,
which, if you're lucky, may do something close to what you want, but I
don't know any reason why it would be _expected_ to do what you want at all.

> read (x, '(1x, a4)') out
>
> print *, 'in= ', in
> print *, 'out=', out
>
> Basically what the code is doing is using a character array as an
> internal binary file to write values in to, which are later extracted.


If you want binary internal storage then what's wrong with the integer
scalar or array that you start with? It's already binary (if you're
using a binary computer), and you cannot get any more efficient in
either space or speed. As my CVF manual puts it: "Formatted, sequential
WRITE statements translate data _from_ binary to character form [...]"
(Emphasis mine.)

> Obviously there is an implicit assumption that an integer is four bytes
> here but let's not worry about that right now. Sometimes the value is
> integer, other times it is real. So despite the big/little endian nature
> of the underlying works, I expect (and the original developer too) that in
> the end (in == out). In fact, that is the behavior on Solaris, digital
> fortran, and HP-UX.
>
> HOWEVER,
> The g77 compiler on both Cygwin and RH Linux systems produced the
> following result:
> in = 10
> out = 32
> I tried v3.3.3 and v3.4.1 of the gcc package with the same result.
>
> I did further experimentation and discovered that the ONLY byte value
> that suffers from this behavior is 10 (0xA). The same behavior happens
> when the argument in question is a real value that happens to have one
> of its byte values = 10. Upon read, a value of 32 is substituted for 10,
> and the real is reconstructed - but you obviously get the wrong result!


10 (decimal) is the ASCII code for the line feed character. 32
(decimal) is the ASCII code for a space character. You are using
formatted writes and reads to encode and decode the data. I don't find
it at all surprising that any particular Fortran runtime library would
do as you describe. It is entirely possible that _any_ attempt to
output a linefeed character via formatted I/O would result in such a
translation.

> Thus there is a healthy subset of both integer and real values that will
> be mangled by the above code. Since the application that this came from is
> a heavily numeric one, this behavior is unacceptable.
>
> I have come up with a couple ways to work around this little problem, but
> I'm wondering what the real problem is here. I checked the F77 standard
> and nothing caught my eye regarding this. One person has suggested that
> the behavior is simply undefined so milage my vary.


As far as I can tell, that person is correct. Now, g77 does support the
Z edit descriptor (a Fortran 90 feature for hexadecimal I/O), which may
do something sufficiently similar to what you're looking for, but I'm
not certain whether in general the behavior of that edit descriptor for
real (as opposed to integer) I/O items is standardized.

> I read the thread on NEW_LINE() and some of that discussion seemed
> relevant. In C/C++ on Win32 systems, you have to specify binary mode when
> you open a external file for reading otherwise line-ending characters are
> magicaly "converted" for you. This seems like something of the same ilk.
> Could there be a bug in the underlying C code for g77 that isn't opening
> this "file" in the proper mode and thus causing this "conversion" from
> 0x0A to 0x20 ??


See above. I am confident that the behavior you observe is intentional,
not a bug, and I can at least one excellent argument in favor of it. Do
note that Unix-like systems do not make the same kind of distinction
between binary and text files that Windows does, so the specific
potential issue you raise is not even relevant unless you're using g77
on Windows.

Your code appears to be nonstandard with respect to the mechanism you
have asked about, thus it is unsurprising that you would run into a
compiler / runtime system on which it fails. As portability seems to be
important for this code, you would do well to find a standard solution
for the problem. You have not given a broad enough picture of what you
are trying to do with this feature for me to guess what a better,
standard conformant way of approaching the problem would be. Maybe some
of the others around here are more insightful.


John Bollinger
jobollin@indiana.edu


meek@skyway.usask.ca

2004-10-12, 3:58 pm

In a previous article, Crabby Gator <Crabby@Gators.com> wrote:
>Greetings all,
>I have come across a very strange problem whilst compiling a
>Fortran program with the gcc/g++/g77 package.
>Consider the following:
>
> character*16 x
> integer in, out
>
> in = 10
>
> write (x, '(''&'', a4)') in
>
> read (x, '(1x, a4)') out
>
> print *, 'in= ', in
> print *, 'out=', out
>
>Basically what the code is doing is using a character array as an
>internal binary file to write values in to, which are later extracted.
>Obviously there is an implicit assumption that an integer is four bytes
>here but let's not worry about that right now. Sometimes the value is
>integer, other times it is real. So despite the big/little endian nature
>of the underlying works, I expect (and the original developer too) that in
>the end (in == out). In fact, that is the behavior on Solaris, digital
>fortran, and HP-UX.
>
>HOWEVER,
>The g77 compiler on both Cygwin and RH Linux systems produced the
>following result:
> in = 10
> out = 32
>I tried v3.3.3 and v3.4.1 of the gcc package with the same result.
>
>I did further experimentation and discovered that the ONLY byte value
>that suffers from this behavior is 10 (0xA). The same behavior happens
>when the argument in question is a real value that happens to have one
>of its byte values = 10. Upon read, a value of 32 is substituted for 10,
>and the real is reconstructed - but you obviously get the wrong result!
>Thus there is a healthy subset of both integer and real values that will
>be mangled by the above code. Since the application that this came from is
>a heavily numeric one, this behavior is unacceptable.
>
>I have come up with a couple ways to work around this little problem, but
>I'm wondering what the real problem is here. I checked the F77 standard
>and nothing caught my eye regarding this. One person has suggested that
>the behavior is simply undefined so milage my vary.
>
>I read the thread on NEW_LINE() and some of that discussion seemed
>relevant. In C/C++ on Win32 systems, you have to specify binary mode when
>you open a external file for reading otherwise line-ending characters are
>magicaly "converted" for you. This seems like something of the same ilk.
>Could there be a bug in the underlying C code for g77 that isn't opening
>this "file" in the proper mode and thus causing this "conversion" from
>0x0A to 0x20 ??
>
>The thoughts of the community are appreciated!
>
>Crabby
>
>

... my guess (I don't use g77) is that 0xA (which is
the ascii control character for linefeed) is taken as the
end of a line in your *formatted* read statement - i.e.
it is not used in the value you are reading.
That doesn't explain the 32 (space). It seems you should
(small endian anyway) get 38 (the decimal code for &)
Chris
Jugoslav Dujic

2004-10-12, 3:58 pm

m@skyway.usask.ca wrote:
| In a previous article, Crabby Gator <Crabby@Gators.com> wrote:
|| Greetings all,
|| I have come across a very strange problem whilst compiling a
|| Fortran program with the gcc/g++/g77 package.
<snip>
|| I read the thread on NEW_LINE() and some of that discussion seemed
|| relevant. In C/C++ on Win32 systems, you have to specify binary mode when
|| you open a external file for reading otherwise line-ending characters are
|| magicaly "converted" for you. This seems like something of the same ilk.
|| Could there be a bug in the underlying C code for g77 that isn't opening
|| this "file" in the proper mode and thus causing this "conversion" from
|| 0x0A to 0x20 ??
||
|| The thoughts of the community are appreciated!
||
|| Crabby
||
||
| ... my guess (I don't use g77) is that 0xA (which is
| the ascii control character for linefeed) is taken as the
| end of a line in your *formatted* read statement - i.e.
| it is not used in the value you are reading.
| That doesn't explain the 32 (space). It seems you should
| (small endian anyway) get 38 (the decimal code for &)

But now, we're trying to reverse-engineer how g77's I/O library works...

I render Crabby's post as question whether it's a bug in there,
or the Standard leaves this as "undefined"/"processor-dependent".
I don't agree with John Bollinger's interpretation, as (AFAIK),
it is allowed to use A edit descriptor for non-characters
(providing, in effect, a kind of "binary" I/O).

--
Jugoslav
___________
www.geocities.com/jdujic

Please reply to the newsgroup.
You can find my real e-mail on my home page above.

Dr Ivan D. Reid

2004-10-12, 3:58 pm

On Tue, 12 Oct 2004 16:53:18 +0200, Jugoslav Dujic <jdujic@yahoo.com>
wrote in <2t29ebF1putr5U1@uni-berlin.de>:

> I render Crabby's post as question whether it's a bug in there,
> or the Standard leaves this as "undefined"/"processor-dependent".
> I don't agree with John Bollinger's interpretation, as (AFAIK),
> it is allowed to use A edit descriptor for non-characters
> (providing, in effect, a kind of "binary" I/O).


My CVF Language Reference flags it as an extension to output non-
character data as Hollerith.

--
Ivan Reid, Electronic & Computer Engineering, ___ CMS Collaboration,
Brunel University. Ivan.Reid@brunel.ac.uk Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".
Richard E Maine

2004-10-12, 3:58 pm

"John C. Bollinger" <jobollin@indiana.edu> writes:

> You are formatting an integer with a character edit descriptor. G77
> is unlikely to be the only compiler for which this causes odd /
> unexpected behavior.


On some compilers, this will probably cause the not-so-odd behavior or
aborting with an error at either compile time or run time. This has
been nonstandard starting with f77. For example, I just tried the
following program

program hollerith
write (*,'(a4)') 123
end

and it gave me the run-time error message

Invalid edit descriptor for integer i/o-list item
Program terminated by fatal I/O error

Admitedly, using that particular compiler's -dusty flag does allow
the code to run, but there is no guarantee that every comppiler will
necessarily have such an option.
[color=darkred]

Ouch. While it can obviously be made to work for some (perhaps even most)
compilers, there are an awful lot of potential problems here. This would
be pretty low down on my list of candidate approaches. I'd have to be
pretty desparate before I got that far down on the list. Yes, I've
done it, when working with 3rd-party code that already did it and
wasn't reasonable for me to redo (which might be the situation you
are in), but it never makes me comfortable.

Formatted I/O can concievably do all kinds of strange things, some of
which I've seen from time to time. It isn't designed for binary
application, so attempting to use it for that can give surprises.
Think of possibilities like stripping the high bit from every byte, to
name just one. I've have definitely seen binary data get "abused"
like that by mislabelling it as formatted; I think that was in
doing something like formatted ftp or something of the sort, rather
than from Fortran formatted I/O, but the same principle applies - it
could happen.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Richard E Maine

2004-10-12, 3:58 pm

"Jugoslav Dujic" <jdujic@yahoo.com> writes:

> I don't agree with John Bollinger's interpretation, as (AFAIK),
> it is allowed to use A edit descriptor for non-characters
> (providing, in effect, a kind of "binary" I/O).


Gee. I usually think of you as more of an f90 type of person than
an f66 one. There are a few people here who are prone to remember
f66 better than f90, but I had you in a different group. :-)

This was allowed in f66, but not in f77 or later. In f2003, see
the first sentence of 10.6.3.

"The A[w] edit descriptor is used with an input/output list item
of type character."

Hmm. It is sort of unfortunate that the standard doesn't use the
word "shall" in there somewhere like it does for other edit descriptors.
I suppose one might consider that to leave the question open, but I
don't think you'd find it stayed open very long, insomuch as

1. The standard has no interpretation of what it would mean to use
anything other than a character. (I know that you and I can
come up with a meaning, but the standard doesn't). That's one
of the "last resort" outs for such interpretations. See the first
sentence of 1.5 on "conformance". I don't like to have to resort
to that reasoning. It is "nicer" to find an explicit prohibition.
But that first sentence of 1.5 is part of the standard (and a
very important one), so it does count.

2. Appendix C of the f77 standard explicitly lists that usage as one
of the things removed from f77 as part of the removal of Hollerith.
See C.6 of the f77 standard. That would seem pretty good for
establishing intent.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
glen herrmannsfeldt

2004-10-12, 3:58 pm



Jugoslav Dujic wrote:

(snip)

> I render Crabby's post as question whether it's a bug in there,
> or the Standard leaves this as "undefined"/"processor-dependent".
> I don't agree with John Bollinger's interpretation, as (AFAIK),
> it is allowed to use A edit descriptor for non-characters
> (providing, in effect, a kind of "binary" I/O).


It is allowed in Fortran 66, and so in any compiler that
claims backward compatibility to Fortran 66. Some supply
a flag or option to turn on such backward compatibility.

The early Fortran 77 compilers pretty much had to support it,
as there wasn't much new code written yet. The VAX/VMS
linker even has special code to support it, as it converts
Hollerith (apostrophed string constants) as subroutine
arguments from pass by descriptor to pass by reference if
the dummy argument is not a CHARACTER variable.

Assuming one supports it at all, I do find the conversion
of X'0A' to X'20' strange.

I don't remember if it is standard, but some compilers I know
would accept CHARACTER arrays for internal READ/WRITE as
multiple records, in the same way that Fortran would normally
print multiple lines, either a / format separator, or reaching
the end of the FORMAT list.

Goes g77 also convert X'0A' to X'20' writing CHARACTER variables
to files?

-- glen

Richard E Maine

2004-10-12, 3:58 pm

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

> I don't remember if it is standard, but some compilers I know
> would accept CHARACTER arrays for internal READ/WRITE as
> multiple records, in the same way that Fortran would normally
> print multiple lines, either a / format separator, or reaching
> the end of the FORMAT list.


Yes, that is in the standard (f77 and after).

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Tim Prince

2004-10-12, 8:56 pm


"Richard E Maine" <nospam@see.signature> wrote in message
news:m1is9fdgeu.fsf@MLMCE0000L22801.local...
> "Jugoslav Dujic" <jdujic@yahoo.com> writes:
>
>
> Gee. I usually think of you as more of an f90 type of person than
> an f66 one. There are a few people here who are prone to remember
> f66 better than f90, but I had you in a different group. :-)
>
> This was allowed in f66, but not in f77 or later. In f2003, see
> the first sentence of 10.6.3.
>
> "The A[w] edit descriptor is used with an input/output list item
> of type character."
>

But very few f66 compilers had an internal read and write extension in the
f77 form, so I would be surprised to find many new or old compilers which
ever supported such a combination of f77 practice with stuff which was
already ruled out in f77.


glen herrmannsfeldt

2004-10-12, 8:56 pm



Tim Prince wrote:


> But very few f66 compilers had an internal read and write extension in the
> f77 form, so I would be surprised to find many new or old compilers which
> ever supported such a combination of f77 practice with stuff which was
> already ruled out in f77.


WATFIV had it before 1974, CHARACTER variables and all.

Most of the path for internal I/O is likely the same as external,
to avoid a lot of code duplication.

Otherwise, in the F66 days there was a routine that would modify
the library code to allow a specified unit number to write to
or read from a supplied array.


CALL FIO999(NUNIT,BUFLOC,IRECLG,IBUFLG,&nnn)

NUNIT (INTEGER) "Unit number" to be associated with a core-memory
buffer area.
BUFLOC (array name) First word of buffer area (must be located
on a full word boundary).

IRECLG (INTEGER) Length in fullwords of each record in the buffer.
IBUFLG (INTEGER) Length in fullwords of the entire buffer
(must be an integral multiple of IRECLG).
nnn Statement number to which control will be returned
if errors detected in the other parameters.

-- glen

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com