For Programmers: Free Programming Magazines  


Home > Archive > Fortran > October 2004 > NEW_LINE()









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author NEW_LINE()
James Giles

2004-10-01, 8:56 pm

I just noticed that the new standard (F2003) overspecifies
formatted stream I/O by always interpreting the record
structure of the file instead of just passing the characters
found straight through. For example, if a file happens to
contain (as text data) something that the I/O library would
treat as a record mark if the file were opened as a non-stream
formatted file, the I/O library consumes that data and passes
through one character that matches the result of the NEW_LINE
intrinsic. Or, if your output data contains (as text data) one
of those NEW_LINE characters, that data will not be written
and an implementation dependent record mark will be output
instead.

I guess someone pointed this out to me before, but I tought
the NEW_LINE intrinsic was merely informative (and easy
to ignore). I didn't realize that the stream I/O itself didn't
move the data unchanged.

Well, seems we still need to add a new feature to support *real*
stream I/O.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Richard E Maine

2004-10-01, 8:56 pm

"James Giles" <jamesgiles@worldnet.att.net> writes:

> I just noticed that the new standard (F2003) overspecifies
> formatted stream I/O by always interpreting the record
> structure of the file instead of just passing the characters
> found straight through. [describes what he means]
> Well, seems we still need to add a new feature to support *real*
> stream I/O.


A few notes.

1. This is basically modelled directly after C, stream I/O having been
introduced under the guise of a C interop feature.

Please note that I'm *NOT* necesssarily saying that I like it.
I'm just pointing out where this came from.

2. Unformatted streams are "real" in the sense that the data passes
through completely transparently. Initially, unformatted streams
are all that I proposed, but other people also wanted formatted
ones.

You could even use unformatted streams to make formatted files
on pretty much all current systems. It is system-dependent
in that you'd have to add the right newline characters (and
even more so in that the file system has to be one where adding
control characters to the end of each record is how to make a
formatted file - but that's how most current systems are, whether
I like it or not). Anyway, on current systems, one could build
up other things based on unformatted streams.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
James Giles

2004-10-01, 8:56 pm

Richard E Maine wrote:
> "James Giles" <jamesgiles@worldnet.att.net> writes:
>
>
> A few notes.
>
> 1. This is basically modelled directly after C, stream I/O having been
> introduced under the guise of a C interop feature.
>
> Please note that I'm *NOT* necesssarily saying that I like it.
> I'm just pointing out where this came from.


I can't find a corresponding requirement in the C standard.
I think *some* I/O functions in may C do something similar
(it's hard to tell), but that it's independent of whether formatting
is applied or not. §5.2.1 of the C standard (I'm actually
looking at a draft of C99 dated Aug. 3, 1998, WG14/N843)
says that the execution character set contains a new-line
character, but doesn't say this character is the result of a
conversion of any implementation dependent file convention
(though it does explicitly say that for the new-line in the
source file character set - they regard the character set used
for source files and that of the execution environment as
possibly distinct). I take that to mean that such a conversion
is not intended or allowed. I might be wrong.

> 2. Unformatted streams are "real" in the sense that the data passes
> through completely transparently. Initially, unformatted streams
> are all that I proposed, but other people also wanted formatted
> ones.
>
> You could even use unformatted streams to make formatted files
> on pretty much all current systems. It is system-dependent
> in that you'd have to add the right newline characters (and
> even more so in that the file system has to be one where adding
> control characters to the end of each record is how to make a
> formatted file - but that's how most current systems are, whether
> I like it or not). Anyway, on current systems, one could build
> up other things based on unformatted streams.


Ok. So the upshot of this is that stream I/O is only *really*
stream I/O if you open it for unformatted and then do your
format conversions on internal files based on the buffers you
transmit unformatted. As I said, the standard overspecified:
too much stuff when the really needed functionality is much
smaller and simpler.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


glen herrmannsfeldt

2004-10-01, 8:56 pm



James Giles wrote:

> Richard E Maine wrote:


[color=darkred]

I don't know about real stream I/O. IBM's MVS (and related systems),
VM/CMS and DEC/Compaq/HP's VMS (in some record formats) don't use
control characters to terminate records. They use either fixed length
records, or variable length records containing a block header with the
block length.

In the case of C, the library converts between newline terminated
lines and the record structure of the file system.
[color=darkred]
[color=darkred]
[color=darkred]
[color=darkred]
> I can't find a corresponding requirement in the C standard.
> I think *some* I/O functions in may C do something similar
> (it's hard to tell), but that it's independent of whether formatting
> is applied or not. §5.2.1 of the C standard (I'm actually
> looking at a draft of C99 dated Aug. 3, 1998, WG14/N843)
> says that the execution character set contains a new-line
> character, but doesn't say this character is the result of a
> conversion of any implementation dependent file convention
> (though it does explicitly say that for the new-line in the
> source file character set - they regard the character set used
> for source files and that of the execution environment as
> possibly distinct). I take that to mean that such a conversion
> is not intended or allowed. I might be wrong.


Well, C makes a distinction between text files and binary files,
though the two are the same under unix-like systems. It doesn't
have to do with the use of formatting, but with conversin of newline
characters. Under DOS/Windows on writing a text file each newline
character is converted to two characters, CR and LF (X'0d' and X'0a')
and converted back on input. A file opened in binary does not have
the conversions done.


[color=darkred]

Except ones like I described above.
[color=darkred]
> Ok. So the upshot of this is that stream I/O is only *really*
> stream I/O if you open it for unformatted and then do your
> format conversions on internal files based on the buffers you
> transmit unformatted. As I said, the standard overspecified:
> too much stuff when the really needed functionality is much
> smaller and simpler.


How about Fortran implementations of getchar(), putchar(),
printf() and scanf()?

(Just kidding.)

-- glen



James Giles

2004-10-01, 8:56 pm

glen herrmannsfeldt wrote:
....
> I don't know about real stream I/O. IBM's MVS (and related systems),
> VM/CMS and DEC/Compaq/HP's VMS (in some record formats) don't use
> control characters to terminate records. They use either fixed length
> records, or variable length records containing a block header with the
> block length.
>
> In the case of C, the library converts between newline terminated
> lines and the record structure of the file system.


I just read the appropriate parts of the C standard and I find no
mention of that. The description of fscanf, for example, says
nothing about it. Nor, can I see any reason (in any language)
to use stream I/O on text files, instead of record based I/O,
if the data on the stream is modified in that way. It eliminates
the whole value of the concept.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Walter Spector

2004-10-02, 3:56 am

James Giles wrote:
>
> glen herrmannsfeldt wrote:
> ...
>
> I just read the appropriate parts of the C standard and I find no
> mention of that. The description of fscanf, for example, says
> nothing about it. Nor, can I see any reason (in any language)
> to use stream I/O on text files, instead of record based I/O,
> if the data on the stream is modified in that way. It eliminates
> the whole value of the concept.


In the C 'fopen' library routine, one can add the letter 'b' to the
file mode to turn on 'binary' mode. The default is 'text' mode.
In my copy of C89 (November 9, 1987 draft) this is described in
§4.9.2.

As Glen mentioned, this has little meaning on unix-like systems.
But on other systems can be quite important. I first encountered
this in the C compiler for the long-defunct Cray-1 Operating System
(COS). It used control words to delineate records - even for simple
text datasets. So one had to set the 'b' in the mode to get a true
binary stream.

Walt
-...-
Walt Spector
(w6ws at earthlink dot net)
James Giles

2004-10-02, 3:56 am

Walter Spector wrote:
> James Giles wrote:
>
> In the C 'fopen' library routine, one can add the letter 'b' to the
> file mode to turn on 'binary' mode. The default is 'text' mode.
> In my copy of C89 (November 9, 1987 draft) this is described in
> §4.9.2.



OK. Now I see it. But, as I said before, that means the decision
is independent of whether you use formatting on the file or not.
As Fortran 2003's implementation of streams works, it introduces
no functionality that non-advancing formatted I/O doesn't already
provide in F95.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Ken Plotkin

2004-10-02, 3:56 pm

On 01 Oct 2004 15:27:45 -0700, Richard E Maine <nospam@see.signature>
wrote:


>A few notes.
>
>1. This is basically modelled directly after C, stream I/O having been
> introduced under the guise of a C interop feature.


That is unfortunate. Why is there a fetish about interoperating with
C? Why not a push to interoperate with BASIC, COBOL, Lisp, etc.?
Even more, since C came after Fortran (as well as a bunch of other
languages), why does it not fall on the C community to ensure
interoperation with everyone else?

> Please note that I'm *NOT* necesssarily saying that I like it.
> I'm just pointing out where this came from.


Your place in heaven is ensured. :-)

>2. Unformatted streams are "real" in the sense that the data passes
> through completely transparently. Initially, unformatted streams
> are all that I proposed, but other people also wanted formatted
> ones.


The committee is fairly democratically run, isn't it? And its members
tend to be nice to each other? That, unfortunately, leads to bloated
and ambiguous results.

> You could even use unformatted streams to make formatted files
> on pretty much all current systems. It is system-dependent
> in that you'd have to add the right newline characters (and

[snip]

I've done that often enough. As long as there is unformatted stream,
you can do whatever you want. Your program prepares exactly the bits
you want to go out, and unformatted stream delivers those.

I'm not sure I want to hear the reasoning behind formatted stream.

Ken Plotkin
Jan Vorbrüggen

2004-10-05, 9:00 am

> I'm not sure I want to hear the reasoning behind formatted stream.

You do know that the FTP protocol makes the distinction between
transferring a file in binary or in ASCII mode, do you not? This
is exactly the same distinction.

Jan
Jan Vorbrüggen

2004-10-05, 9:00 am

> As Fortran 2003's implementation of streams works, it introduces
> no functionality that non-advancing formatted I/O doesn't already
> provide in F95.


It does - stream format does not have any record length limitations.

Jan
Jan Vorbrüggen

2004-10-05, 9:00 am

> I just noticed that the new standard (F2003) overspecifies
> formatted stream I/O by always interpreting the record
> structure of the file instead of just passing the characters
> found straight through. For example, if a file happens to
> contain (as text data) something that the I/O library would
> treat as a record mark if the file were opened as a non-stream
> formatted file, the I/O library consumes that data and passes
> through one character that matches the result of the NEW_LINE
> intrinsic. Or, if your output data contains (as text data) one
> of those NEW_LINE characters, that data will not be written
> and an implementation dependent record mark will be output
> instead.


And that is exactly as it should be. For instance, it makes it
easy for Steve Lionel to add a new CONVERT option to the OPEN
statement that will allow a program to transparently read or
write a text file for Mac, Windows or Unix use, as the case may
be. C does not hide the implementation-dependant NEW_LINE behind
this curtain.

Jan
James Giles

2004-10-05, 8:58 pm

Jan Vorbrüggen wrote:
>
> It does - stream format does not have any record length limitations.



Neither does non-advancing I/O.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


James Giles

2004-10-05, 8:58 pm

Jan Vorbrüggen wrote:
>
> And that is exactly as it should be. [...]


I disagree. If the file is *not* a system text file, but has ASCII
character data on it, what I really want (and what I thought stream
I/O was going to give me) was the ability to do formatted I/O
on such a file without worrying about the system or the run-time
library changing any of the information. Instead, anything that
looks like a implementation-dependent record mark will be converted
into a NEW_LINE() character on input, and on output, anything
in my data that happens to be the NEW_LINE() character will
be turned into a implementation-dependent record mark. I will
*still* not be able to do formatted I/O on genuine streams of
data.

What the F2003 so-called formatted stream I/O does is just
introduce a new (and more complicated) way to do what
non-advancing I/O already does. For one thing, non-advancing
I/O doesn't require that user to know (or care) about record marks
as such: there never needed to be a NEW_LINE() intrinsic to
support non-advancing I/O. Non-advancing input tells you that
you reached the record mark, and where it is, without requiring
you to scan for it. And so on.

Oh well, I can do unformatted stream I/O with CHARACTER
variables, and then to format conversion on those as internal
files. This assumes that the "file storage unit" for unformatted
stream I/O is the same size as a character (or less) so that
the "stream" won't assume (or insert) padding during the
I/O process. The whole issue complicates uses of the
feature. :-(

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


glen herrmannsfeldt

2004-10-05, 8:58 pm



James Giles wrote:

(snip)

> I disagree. If the file is *not* a system text file, but has ASCII
> character data on it, what I really want (and what I thought stream
> I/O was going to give me) was the ability to do formatted I/O
> on such a file without worrying about the system or the run-time
> library changing any of the information. Instead, anything that
> looks like a implementation-dependent record mark will be converted
> into a NEW_LINE() character on input, and on output, anything
> in my data that happens to be the NEW_LINE() character will
> be turned into a implementation-dependent record mark. I will
> *still* not be able to do formatted I/O on genuine streams of
> data.


C will do exactly that with a text file. If you don't want
anything done to it you need to open it as a binary file.
(How many Fortran systems use a C library underneath?)

C has text and binary files, Fortran has formatted and unformatted,
so it would seem that you want an unformatted stream file.

> Oh well, I can do unformatted stream I/O with CHARACTER
> variables, and then to format conversion on those as internal
> files. This assumes that the "file storage unit" for unformatted
> stream I/O is the same size as a character (or less) so that
> the "stream" won't assume (or insert) padding during the
> I/O process. The whole issue complicates uses of the
> feature. :-(


If you want unformatted (binary) stream, I think that has to be
the case. Consider the PDP-10 with 36 bit words, five ASCII
characters per word in text files. If you want to write a
binary file you need the ability to set or read all 36 bits.

On a byte addressed machine I think you are safe until CHARACTER
variables use unicode. Maybe arrays of one byte integers?

-- glen



James Giles

2004-10-05, 8:58 pm

glen herrmannsfeldt wrote:
> James Giles wrote:
> [...] I will
>
> C will do exactly that with a text file. If you don't want
> anything done to it you need to open it as a binary file.
> (How many Fortran systems use a C library underneath?)
>
> C has text and binary files, Fortran has formatted and unformatted,
> so it would seem that you want an unformatted stream file.


No. I want to do formatted I/O. I don't want record marks
precessed.

In C, the distinction between formatted and unformatted is
*independent* of the distinction between binary and text streams.
In C, I don't recall ever using text streams in my own code. I don't
use C itself much any more. I had forgot that there even was such a
distinction in C until this thread.

I any case, my only use for formatted stream I/O (as distinct
from formatted nonadvancing I/O, which I already use for
arbitrary length record handling) would be to avoid the automatic
processing of record marks by the system or run-time library.
Since the F2003 standard makes streams redundant with nonadvancing
I/O on formatted files, what's my motivation for using them?

> If you want unformatted (binary) stream, [...]


You are still assuming (incorrectly, I believe) that C's "binary"
is a synonym for Fortran's "unformatted". It's not a valid association
in C, which has formats too.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


glen herrmannsfeldt

2004-10-05, 8:58 pm



James Giles wrote:

(snip)

[color=darkred]
> No. I want to do formatted I/O. I don't want record marks
> precessed.


> In C, the distinction between formatted and unformatted is
> *independent* of the distinction between binary and text streams.
> In C, I don't recall ever using text streams in my own code. I don't
> use C itself much any more. I had forgot that there even was such a
> distinction in C until this thread.


Text streams are the default, unless you add a b to the
second argument to fopen(). It is more likely that you did
use them, then that you didn't.

> I any case, my only use for formatted stream I/O (as distinct
> from formatted nonadvancing I/O, which I already use for
> arbitrary length record handling) would be to avoid the automatic
> processing of record marks by the system or run-time library.
> Since the F2003 standard makes streams redundant with nonadvancing
> I/O on formatted files, what's my motivation for using them?


I believe (someone will correct me if I am wrong) that non-advancing
I/O can still have a record length limit. It is convenient for putting
partial lines together, for example. The library may still have
a line buffer to be written out with a record mark at the end.

You can't portably write text files without record mark processing.
Compilers can always provide non-standard extensions.

[color=darkred]
> You are still assuming (incorrectly, I believe) that C's "binary"
> is a synonym for Fortran's "unformatted". It's not a valid association
> in C, which has formats too.


C won't stop you from putting text data into fwrite(), or
binary data into printf(), true. fwrite() will write null
characters to the file, which will cause lots of troubles if
read as text.

Consider that list directed output, which doesn't use a FORMAT
statement, is still considered formatted output. C has a lot
of restrictions on text files to allow for the different record
processing of different systems, including those that don't use
characters for record marks.

-- glen


TimC

2004-10-06, 3:59 am

On Wed, 06 Oct 2004 at 00:27 GMT, glen herrmannsfeldt (aka Bruce)
was almost, but not quite, entirely unlike tea:
> James Giles wrote:
>
> Text streams are the default, unless you add a b to the
> second argument to fopen(). It is more likely that you did
> use them, then that you didn't.


In fact, in my compiler of choice (gcc), there is no difference - just
as I think *should* be the case :)

Ah, religious zealotry.

Actually -- I've got a question. Can fortran deal with writing a
string that finishes with spaces, with any of these methods, and those
spaces would actually be written?

I like writing robust scripts and programs, and I still don't like the
fact that I could hypothetically write a file called "foo ", and
fortran compiled programs couldn't open that file.

>
> I believe (someone will correct me if I am wrong) that non-advancing
> I/O can still have a record length limit.


It does :(

After all these years of fortran programming, I *still* don't like
having to put up with bogus line length limitations -- coming from a C
background, I can't see how reading 1500 formatted floats from a
single line in a file needs to invoke a record structure at all --
just scan in each float until you reach a newline, duh -- it'd also
solve my problem of reading from a pipe (either a named pipe, or stdin
from another program, or whatever) and finding that the program can't
deal with getting a "short read" at the end of each 512 byte buffer
allocated by the operating system -- it's obvious to me that you just
save the last few bytes you read but haven't processed yet, read in
the next batch of data, and then concatenate them, looking for the
next whitespace or newline). You can open a file with a larger recl
(and I don't know what consequences this has), but even this has
caused problems for me in the past.


--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
"The application did not fail successfully because of an error"
James Giles

2004-10-06, 3:59 am

glen herrmannsfeldt wrote:
> James Giles wrote:
>
> (snip)
>
>
>
>
> Text streams are the default, unless you add a b to the
> second argument to fopen(). It is more likely that you did
> use them, then that you didn't.


I don't recall. If I didn't, it must have been the case that I only
used C on UN*X style systems in which the "conversion" of record
marks was the identity operation. More likely, I always used
the 'b' on the fopen(), having once learned it and then just always
doing it because I knew it worked. (I once did site support for
C implementations, so presumably I once knew all this stuff.
But C is such an arcane and complex language that the details
pass from memory without the constant reinforcement of daily
use. I don't use C much at all any more. Something that deserves
the use of the following emoticon. :-)

In any case, fscanf() and fprintf() operate on either binary or
text files. I predict that a widely demanded extension to F2003
will be something that allows formatted I/O operations that
*aren't* mangled by altering the record mark(s). If this were
standard, then other non-standard extensions (like some kind
of CONVERT option on the OPEN statement) would not be
required in order to write portable, standard conforming
code to, say, read and write files that conform to the MAC
form on Windows or UN*X. Why anyone would want a
non-standard feature to do so when a properly designed
standard feature could do it is a mystery.

>
> I believe (someone will correct me if I am wrong) that non-advancing
> I/O can still have a record length limit. It is convenient for putting
> partial lines together, for example. The library may still have
> a line buffer to be written out with a record mark at the end.


Well, that would be a surprise to the people that wrote the
varying string standard (which is a supplimentary standard
to Fortran). Their sample implementation uses nonadvancing
I/O to GET and PUT arbitrary length strings to/from single
records.

But, I can't find a line length limit in the Fortran standard
that pertains to nonadvancing I/O. I've implemented buffering
mechanisms such as you describe. On input, they read the
whole records, or as much as the buffer can hold. In the latter
case, they refill the buffer on the fly until the end of the record
is found. On output, they retain the data in the buffer until the
record is terminated, or until the buffer is full, and then the write
it out. In the latter case, the record is still not complete, and more
data can still be added until a write with ADVANCE='yes' is done.

Yes, implementations can cite the section 1 clause that permits
arbitrary limits on anything they want, but that would be the
only justification. But, they could use that excuse for any of a
large number of poor implementation choices: including
placing the *same* limit on the length of stream I/O records!!
So, there's still no functionality that streams provide that isn't
already covered by nonadvancing I/O.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


glen herrmannsfeldt

2004-10-06, 3:59 am

TimC wrote:
(I wrote)

[color=darkred]
> In fact, in my compiler of choice (gcc), there is no difference - just
> as I think *should* be the case :)


Have you tried gcc on DOS or Windows systems?

I believe there is also a gcc for IBM's MVS which has files
with a record structure to them, and no record ending character.

I have before wanted a file with all 256 characters allowed
in addition to a record mark.

-- glen

Ken Plotkin

2004-10-06, 3:59 am

On Tue, 05 Oct 2004 14:35:16 +0200, Jan Vorbrüggen
<jvorbrueggen-not@mediasec.de> wrote:

>
>You do know that the FTP protocol makes the distinction between
>transferring a file in binary or in ASCII mode, do you not? This
>is exactly the same distinction.


I do know that. I also always use binary for ftp, even if the file is
ASCII. If the file is coming from a system with different record
terminators, I prefer to filter it after I get it, rather than leave
it up to the ftp client.

My fears were well-founded. :-) But thanks for the explanation.

Ken Plotkin

TimC

2004-10-06, 3:59 am

On Wed, 06 Oct 2004 at 03:14 GMT, glen herrmannsfeldt (aka Bruce)
was almost, but not quite, entirely unlike tea:
> TimC wrote:
> (I wrote)
>
>
>
> Have you tried gcc on DOS or Windows systems?


Hmmm. Windows isn't posix conforming, so the comment would imply
non-workingness:

This is strictly for compatibility with ANSI X3.159-1989 (``ANSI
C'') and has no effect; the ``b'' is ignored on all POSIX
conforming systems, including Linux. (Other systems may treat
text files and binary files differently, and adding the ``b'' may
be a good idea if you do I/O to a binary file and expect that your
program may be ported to non-Unix environments.)

What's the status of Windows NT? I guess it's only "posix conforming"
(for suitable values of conforming) when certain libraries are linked
in.

> I have before wanted a file with all 256 characters allowed
> in addition to a record mark.


Roll your own :)

Just pick a random character, and when that character pops up, escape
it (of course, you'll also have to escape the escape character).

--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
If you're too open minded, your brains will fall out. --unk
TimC

2004-10-06, 3:59 am

On Wed, 06 Oct 2004 at 03:17 GMT, Ken Plotkin (aka Bruce)
was almost, but not quite, entirely unlike tea:
> I do know that. I also always use binary for ftp, even if the file is
> ASCII. If the file is coming from a system with different record
> terminators, I prefer to filter it after I get it, rather than leave
> it up to the ftp client.
>
> My fears were well-founded. :-) But thanks for the explanation.


All I can say is, I'm glad that there is now rsync. Because if I ever
make that mistake again on a 1GB file transferred from across the
country, I can at least rectify it relatively cheaply :)

--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
If a train station is a place where a train stops, what's a workstation?
Jan Vorbrüggen

2004-10-06, 3:59 am

You are simply wrong, because you are too narrow-minded. There is
no such thing as a "not a system text file [that] has ASCII character
data on it" - what would that be? Either the file conforms to the
hosting system OS's format for text files, or it doesn't. The first
case is what is covered by formatted I/O, the second case is covered
by unformatted I/O. In the latter case, there is no restriction on
what the file may contain - you may voluntarily restrict yourself to
just using the ASCII character set (possibly including some of the
control characters), but that is irrelevant to both Fortran and your
OS. In the latter case, there is also the distinction between a Fortran
record-oriented file and a stream file, with the former being a subset
of the latter, but that is again irrelevant to the subject of discussion,
formatted I/O. And for formatted I/O, it just is a fact of life that
different OSs have different mechanisms of indicating the end of a line
- for one, a CR is inserted, the next inserts a LF, the third inserts
a CR-LF pair, and the fourth, while supporting all of the previous ones,
uses a record-oriented mechanism by default. In order to write a portable
program, I want exactly the behaviour as specified in the standard. My
hat is off to those that put it in, because they got it right.

Jan
Jan Vorbrüggen

2004-10-06, 3:59 am

> I do know that. I also always use binary for ftp, even if the file is
> ASCII. If the file is coming from a system with different record
> terminators, I prefer to filter it after I get it, rather than leave
> it up to the ftp client.


Wrong on three counts: it's the FTP server that does the conversion - well,
in fact both server and client co-operate, each knowing the idiosyncrasies
of its host OS. Second, your approach doesn't trivially work on a PUT. Third,
try that with a VMS or VM text file, and show me the filter that will work
in all cases. Nah, I'd rather leave it to software that has the necessary
code built in by default.

Jan
Jan Vorbrüggen

2004-10-06, 3:59 am

>>It does - stream format does not have any record length limitations.
> Neither does non-advancing I/O.


Hmmm - the RECL explicitly or implicitly specified in the OPEN statement
does not apply for non-advancing I/O? I'm surprised.

In fact, does, indeed, the description of stream I/O in the standard say
that RECL does not apply, as it should?

Jan
Jan Vorbrüggen

2004-10-06, 3:59 am

> In any case, fscanf() and fprintf() operate on either binary or
> text files.


Nonsense. Try generating output with an embedded zero using these
two functions.

Jan
James Giles

2004-10-06, 8:57 am

Jan Vorbrüggen wrote:
> You are simply wrong, because you are too narrow-minded. There is
> no such thing as a "not a system text file [that] has ASCII character
> data on it" - what would that be? Either the file conforms to the
> hosting system OS's format for text files, or it doesn't. The first
> case is what is covered by formatted I/O, the second case is covered
> by unformatted I/O. [...]


In that case *BOTH* uses are covered by *UNFORMATTED*
I/O since I don't need no stinking run-time to tell me how to do
streams.

You are simply wrong, because you are too narrow minded. You
want to *require* extra work for no practical benefit.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Richard E Maine

2004-10-06, 3:58 pm

Jan Vorbrüggen <jvorbrueggen-not@mediasec.de> writes:

> In fact, does, indeed, the description of stream I/O in the standard say
> that RECL does not apply, as it should?


In the description of RECL,

"The specifier shall not appear when a file is being connected for
stream access".

But more fundamentally, a stream file isn't composed of records in the
first place. This really is a quite fundamental difference and makes
the concept of RECL intrinsically nonsensical. Well, one *COULD*
force such a thing into the mix, but it just doesn't "fit" right;
the implied restrictions would be incredibly odd. For example, you
can generally write over characters in the middle of the file; this
includes writing over "record mark" characters, replacing them with
ordinary ones. So whether that was allowed or not would depend on
how distant the nearest other record mark characters were? Strange.

In the description of stream access (copied from the LaTeX source without
bothering to detex it)

When connected for formatted stream access, an external file has the
following properties:

\begin{enum}
\item Some file storage units of the file may contain record markers;
this imposes a record structure on the file in addition to its stream
structure. There might or might not be a record marker at the end of the
file. If there is no record marker at the end of the file, the final
record is incomplete.

\item No maximum length (\ref{D9:RECL= specifier in the OPEN
statement}) is applicable to these records.

\item Writing an empty record with no record marker has no effect.

\begin{note}
Because the record structure is determined from the record markers that
are stored in the file itself, an incomplete record at the end of the
file is necessarily not empty.
\end{note}


Oh. And also relevant further down is


\item A processor may prohibit some control characters
(\ref{D3:Processor character set}) from appearing in a formatted
stream file.

Essentially the same restriction also applies to non-stream files, by
the way. It just got specified separately for stream files for
organizational rather than technical reasons. (The restriction for
nonstream files is in the definition of formatted records instead of
in the definition of formatted files.)

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Jan Vorbrüggen

2004-10-06, 3:58 pm

> But more fundamentally, a stream file isn't composed of records in the
> first place. This really is a quite fundamental difference and makes
> the concept of RECL intrinsically nonsensical


Basically, I agree. There are however implementations which suffer from
limitations because of limited buffer sizes (necassarily so), where these
limitations shine through to the upper layers (which should not happen).
One of the few broken things in RMS, VMS's record management system.

Jan
James Giles

2004-10-06, 8:56 pm

Steve Lionel wrote:
> On Tue, 05 Oct 2004 22:36:19 GMT, "James Giles" <jamesgiles@worldnet.att.net>
> wrote:
>
>
> I disagree with this assertion. Non-advancing I/O is used to build up a
> record. In many if not most implementations, this is done by copying the data
> to an internal buffer and then writing that buffer to the external file at an
> appropriate time. An implementation may choose to allocate this buffer at a
> fixed size, thus making it possible to overflow it. This would be perfectly
> permissible by the standard.



Yes, but only by appealing to §1.4 (of the F2003 FCD):

This standard does not specify
[...]
(5) The size or complexity of a program and its data that
will exceed the capacity of any particular computing system
or the capability of a particular processor,
[...]

There is no other provision in the standard that establishes a
limit on the length of records that can be processed with non-
advancing I/O (unless the user applies one explicitly with a
RECL specification on the OPEN). Note however that the §1.4
provision also allows the *same* limits on the length of stream
records!! It's a catch-all that permits implementations to establish
any limits they like.

So, that makes it a quality of implementation issue. The fewer
arbitrary limits, the better the implementation's quality. That's
always been a widely quoted criterion. I wiould even say that
a buffer-size limit on the amount of data any individual READ
or WRITE statement processed, as long as it didn't limit the
number of such operations per record (when using nonadvancing
I/O) would not be an onerous limit.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Richard E Maine

2004-10-06, 8:56 pm

In a rush (carpool about to show), so very quickly...

"James Giles" <jamesgiles@worldnet.att.net> writes:
[on the subject of record lengths and stream I/O vs nonadvancing]

> Note however that the §1.4
> provision also allows the *same* limits on the length of stream
> records!! It's a catch-all that permits implementations to establish
> any limits they like.
>
> So, that makes it a quality of implementation issue.


Indeed. I was initially surprised when record length was raised as
one of the big reasons why several vendors wanted formatted stream,
when I had proposed only unformatted ones.

Apparently, multiple vendors do consider allowing arbitrary record
lengths in sequential formatted files to be a significant
implementation complication. I forget the details, but it no doubt
relates to some of the things that we expect to be able to do
with such records/files.

So, yes, vendors could do this and you could make it a criterion
in vendor selection. But my understanding (second hand) is that
multiple vendors feelt it substantially simpler to provide the
arbitrary record-length functionality via stream I/O instead of
by redoing their sequential formatted I/O stuff. Apparently it
would have required a pretty substantial redo.

All very much second hand. Also, as I said, I posted this a bit
in a rush. Carpool came and left for the car while I was typing.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
James Giles

2004-10-06, 8:56 pm

Jan Vorbrüggen wrote:
>
> So tell me how to write a formatted stream file - a text file to
> the host OS - with your specification and without knowledge of
> what the host OS's text file record makers are.


Ask. *THAT* was what I thought the NEW_LINE() intrinsic
was for. You ask what the newline characters is, and then you
use it to read/write your files. To write a record, you stream
out the data and affix the character that the NEW_LINE() intrinsic
recommends. Another way to write the *same* thing is not
to use streams at all, but write the data with nonadvancing I/O.
That automatically uses the host's record conventions and
can otherwise do everything that streams do. Why, in your
challenge, do you insist that I must use streams? What advantage
do they provide? Be explicit.

So, tell me how *you* use a formatted stream to write a text file
that's acceptable to some *other* host using the existing F2003
specification. Tell me how you write a MAC file on UN*X,
for example. I'll wait... Nope? I didn't think so. With streams
as I thought they *should* work it's easy. You write the data you
want followed by a carriage return. No special meaning attached
to the line-feed at all - it can be part of the data, just like on
MACs. The same thing goes for writing UN*X files on Windows,
or - and here's a novel thought - write actual ASCII files (that is,
use the characters that the ASCII standard actually recommends for
record marks). *YOU* can't do that with formatted streams designed
as you prefer. (It's actually possible in C.)


Now, let's review "Language design 101". In an existing language
that has been successful for a long time the addition of any complex
new feature requires extraordinary justification. In an existing
language the already has substantial complexity the addition of a
complex new feature requires extraordinary justification (regardless
of the age of the language).

Now, what constitutes "extraordinary justification"? First, if the
new feature provides capabilities that are redundant with existing
language features, the new feature must be *significantly* simpler
to use, read, and verify that it's correct than the existing features
doing the same operation. This must be an objective peoperty:
that is, it must be something that everyone agrees is true about the
feature. Second, if the feature provides *new* capabilities that
you couldn't do before, (a) it must be demonstrated that there's a
real demand for such additional capability, and (b) it must be
demonstrably simpler to use, read, and verify that it's correct
than all the known alternative extensions to the language that
could provide the same new capability.

Well, I think that formatted streams fail to meet the test. For
those things that you can already do without stream I/O, continuing
to do them without stream I/O is to be preferred. It's simpler.
For the only thing so far claimed that you can't do without streams,
a simpler "extension" to the language would be to explicitly
state that nonadvancing I/O can handle arbitrary record lengths.
(I believe that such an explicit statement would merely be
a clarification, not an extension. But that makes no difference.
The point is that, given that explicit statement, nonadvancing
I/O on normal sequential files is superior to streams for
this purpose.)

Now, there are things that streams could (and probably should)
have permitted that would have been both new and useful. Of
course, it would still have to be debated whether streams were
the *best* way to do these things as opposed to some other
extension. But, since streams don't do these things, the *only*
way we'll get them is with some other extension(s). For example,
what I thought was the only really useful purpose of streams:
to process the data without the system applying any interpretation
to any of the characters with regard to record or file structure.

---

Now, after this going through this debate, I've thought of some of
the other extensions that might be useful. Let's have a NEW_LINE
OPEN statement specifier. If you open a file for formatted I/O
you can specify what character(s) is(are) to be used as the record
mark. This would accomplish wat I wanted to do with streams,
but without (necessarily) using streams.

For example, if you set NEW_LINE = "" (the empty string), then
*no* character would be interpreted as a record mark. That would
mean that all reads and writes would behave like nonadvancing
I/O with ADVANCE="no". This would allow you to process
the file as a raw sequence of characters and treat them all as just
data.

For example, if you set NEW_LINE=achar(13), then the native
character corresponding to the ASCII carriage return would be
used as a record mark (for most systems, this would actually
be the ASCII CR). For those systems where ASCII was the native
character set, you would be reading and writing MAC files,
even though you're not necessarily on a MAC. You would
use ordinary READ and WRITE statements as usual with no
change in the default rules pertaining to ADVANCE.

For example, if you set NEW_LINE= achar(13)//achar(10), then
the native characters corresponding to the ASCII CR-LF sequence
would be used as a record mark (for most systems, this would actually
be the ASCII CR-LF). For those systems where ASCII was the native
character set, you would be reading and writing Windows files,
even though you're not necessarily on a Windows.

This would necessitate a change to the NEW_LINE() intrinsic
function. It would have an optional second argument UNIT.
If the optional second argument was absent, NEW_LINE()
would work as F2003 already states. If the optional second
argument is present, it must be an integer whose value is
a unit number that's presently open for formatted I/O. The
result of NEW_LINE() in such a case would be the newline
character(s) specified for use when that unit was opened.

For those files for which the record structure is represented as
out-of-band information, the use of the NEW_LINE specified
in the OPEN statement would be prohibited (this is a concept that's
beyond the Fortran standard, so the actual statement would be
that there might be files for which the NEW_LINE specifier
is not allowed). Probably, this would be similar to the set of
files for which stream access is prohibited.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare



Ken Plotkin

2004-10-07, 3:59 am

On Wed, 06 Oct 2004 09:17:56 +0200, Jan Vorbrüggen
<jvorbrueggen-not@mediasec.de> wrote:


>Wrong on three counts: it's the FTP server that does the conversion - well,
>in fact both server and client co-operate, each knowing the idiosyncrasies
>of its host OS. Second, your approach doesn't trivially work on a PUT. Third,
>try that with a VMS or VM text file, and show me the filter that will work
>in all cases. Nah, I'd rather leave it to software that has the necessary
>code built in by default.


I don't think I ever did ftp with a VAX. I'll definitely keep your
warning in mind if I do.

The worst ASCII file transfer experience I had was with a Cyber. We
just could not get anything coherent across with ASCII transfer. We
finally resorted to transferring as binary, then decoding the
Cyberscii locally.

Ken Plotkin
Jan Vorbrüggen

2004-10-07, 3:59 am

> I don't think I ever did ftp with a VAX.

VMS now runs on Itanium also...no need to go to the computer museum.

> The worst ASCII file transfer experience I had was with a Cyber.


I can well imagine. It would be interesting to know whether the Cyber
or your host got it wrong. A lot of client implementations seem to think
that all the world is Windows or Unix.

Jan
Dave Thompson

2004-10-15, 3:56 am

On Wed, 06 Oct 2004 09:27:56 +0200, Jan Vorbrüggen
<jvorbrueggen-not@mediasec.de> wrote:

>

That's true. They are rarely useful for binary, except when someone
like the previous poster is using binary for "Unixy text" rather than
"platform text" where that is different, but they are perfectly legal
and well-defined.

> Nonsense. Try generating output with an embedded zero using these
> two functions.
>

Assuming you mean input and output respectively, of a zero-byte aka
null character not a digit zero:

FILE * infile = fopen ("in", "rb"), * outfile = fopen ("out", "wb");
if( ! infile || ! outfile ){ /* error */ }
char foo[10];
if( fscanf (infile, "%10c", foo) != 1 ){ /*error*/ }
/* foo can contain null character(s), in which case
I can't just fputs() it, or strlen() it, or strchr() things
in it, etc. But I can access the chars fine */
for( int i = 0; i < sizeof foo; ++i )
fprintf (outfile, "char#%d is ->%c<-\n", i, foo[i]);

You can even fgets() a line from a file (either binary or text)
containing nulls, but you can't just use the result as a C string; you
have to prefill the buffer and afterwards search for the *last* null,
so it's much easier to just code a loop around getc().

- David.Thompson1 at worldnet.att.net
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com