For Programmers: Free Programming Magazines  


Home > Archive > Fortran > March 2004 > Namelist I/O and direct access files









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Namelist I/O and direct access files
Madhusudan Singh

2004-03-28, 10:29 pm

Hi

I am trying to protect a simulation against interruptions by dumping a set
of variables (a mixed set of integers, integer arrays, reals and real
arrays) onto a namelist file at regular intervals.

To save disk space and to reduce the time taken for file I/O, I am trying to
use a direct access file. Now, to determine the record length, the
suggested method is to use the inquire statement :

inquire(iolength=reclength) output_list

Now, I tried to use the name of the namelist as the output_list. I did not
know whether this was legal usage but I was trying my luck. However, IFC
complains loudly when I do this ?

Is there some way to determine the record length ? My namelist, as mentioned
before, is quite heterogeneous from a type standpoint.

I am using :

open(unit=666,file='vardump. nml',access='direct',form='unformatted',
delim='apostrophe',recl=reclength)

write(666,nml=vardump)

close(666)

I am not using the rec= argument in the write statement as I am using a
namelist with all kinds of fields at one fell stroke.

And did I mention, I am new to namelist and direct access I/O both.

Thanks,

MS
Ken Plotkin

2004-03-28, 10:29 pm

On Sat, 27 Mar 2004 23:52:50 -0500, Madhusudan Singh
<spammers-go-here@yahoo.com> wrote:

>I am trying to protect a simulation against interruptions by dumping a set
>of variables (a mixed set of integers, integer arrays, reals and real
>arrays) onto a namelist file at regular intervals.
>
>To save disk space and to reduce the time taken for file I/O, I am trying to
>use a direct access file. Now, to determine the record length, the
>suggested method is to use the inquire statement :

[snip]

I think there are some incompatabilities in your open. But why be
that elaborate? If the file is to be read by the program when you
re-start it, consider using just an unformatted file - no namelist, no
direct access.

open(unit=666,file='vardump.nml',form='unformatted')
write(666) [whatever you want to save]
close(666)

FWIW, this is how "save" works in Adventure games. "Restore" does
exactly the same thing, but with read instead of write.

If you need to pick throught the entrails of a save file, you can
write a separate program with matching "read", then write what you
want in human-readable form.

Ken Plotkin

Madhusudan Singh

2004-03-28, 10:29 pm

On Sunday 28 March 2004 00:56, Ken Plotkin (kplotkin@nospam-cox.net) held
forth in comp.lang.fortran (<eepc60doq3vhtepdteft8i4s7eha5almbn@4ax.com> ):

> On Sat, 27 Mar 2004 23:52:50 -0500, Madhusudan Singh
> <spammers-go-here@yahoo.com> wrote:
>
> [snip]
>
> I think there are some incompatabilities in your open. But why be
> that elaborate? If the file is to be read by the program when you
> re-start it, consider using just an unformatted file - no namelist, no
> direct access.


Thanks for your response.

I want to use a namelist because it makes my job of reading and writing
infinitely easier.

What incompatibilities does the open I used have ?

I used :

open(unit=666,file='vardump. nml',access='direct',form='unformatted',
delim='apostrophe',recl=reclength)
James Van Buskirk

2004-03-28, 10:29 pm

"Madhusudan Singh" <spammers-go-here@yahoo.com> wrote in message
news:c45qqf$2flsg2$1@ID-159130.news.uni-berlin.de...

> What incompatibilities does the open I used have ?


>

open(unit=666,file='vardump. nml',access='direct',form='unformatted',
delim='a
postrophe',recl=reclength)

The DELIM= specifier is permitted only for a file being connected
for formatted input/output. (ISO/IEC 1539-1:1997(E) Section 9.3.4.9)

If the data transfer statement contains a format or namelist-group-
name, the statement is a formatted input/output statement; otherwise
it is an unformatted input/output statement. (ibid. Section 9.4.1)

You seem to be mixing formatted output (NAMELIST and DELIM=) with
unformatted output (FORM='UNFORMATTED'.) I don't think you can
get NAMELIST to input/output the internal representation of your
data as unformatted input/output would do.

--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


Ken Plotkin

2004-03-28, 10:29 pm

On Sun, 28 Mar 2004 01:19:58 -0500, Madhusudan Singh
<spammers-go-here@yahoo.com> wrote:


>I want to use a namelist because it makes my job of reading and writing
>infinitely easier.


Machine reading or human reading?

I have never used namelist, but I understand that it makes the human
side of preparing and browsing data files easier, if more verbose.
But it should be slower than simpler read/write.

If you want to be able to browse your data dump file, you should not
use unformatted - unless you're skilled at reading hex dumps, like the
characters in "The Matrix." James has pointed out that "unformatted"
in your open is not compatible with the rest - which is what I
figured.

If speed of writing/reading the dump matters, then simple sequential
"unformatted" (or "stream"/"binary"/"transparent", if supported by
your compiler) is the fastest, simplest way.

Ken Plotkin

Richard Maine

2004-03-28, 10:29 pm

Madhusudan Singh <spammers-go-here@yahoo.com> writes:

> To save disk space and to reduce the time taken for file I/O, I am trying to
> use a direct access file.


I'm not quite sure what direct access has to do with saving disk
space or reducing time taken. If you want to randomly access
some part of the file, then direct access has a large effect on the
time, yes. It isn't clear that you are doing that. And I can't
figure out the correlation with disk space at all. I suppose you
save a trivial amount in header size, but that is likely to be
orders of magnitude less than it costs to pad all records to the same
size if they aren't inherently already the same size.

> Now, I tried to use the name of the namelist as the output_list.


That isn't allowed...and doesn't make any sense.

The io length inquire is only relevant for unformatted I/O. It is
necessary because the units of length are unspecified for unformatted
I/O. For formatted I/O, the units are specified to be characters,
so no need for any such construct.

Also, I see a huge disconnect here. You initially said that you were
using direct access to save time and space. Although I don't see
how direct access helps that, I at least take the message that you
think it important to minimize time and space. But...
namelist is probably the single *WORST* option from both time and
space perspectives. This just does not make sense.

> I am not using the rec= argument in the write statement as I am using a
> namelist with all kinds of fields at one fell stroke.


That has nothing to do with it. The rec= is *REQUIRED* for direct
access, period. Even if there is only one record, you still have
to specify the rec=1.

> And did I mention, I am new to namelist and direct access I/O both.


Guess I can tell. :-)

You really don't want direct access here. Looking at the code, it just
isn't doing anything for you; all it is doing is causing problems and
it isn't helping anything.

Namelist is ok for the purpose, but

1. Namelist is for formatted, sequential I/O - not unformatted
direct access.

2. Namelist is slow and space-consuming instead of fast and
compact. But do you really care? Unless you've got some
huge arrays burried in there or unless you do this in the
middle of some time-critical loop, I can't see how either the
space or time would be at issue.

Use namelist if you want a simple way to write a file that will
be easy for a human to read.

--
Richard Maine
email: my last name at domain
domain: sumertriangle dot net
Gordon Sande

2004-03-28, 10:29 pm

In article <geuc60th489ejphtcnsop4mc9lc6iac0ji@4ax.com>,
Ken Plotkin <kplotkin@nospam-cox.net> wrote:

>Subject: Re: Namelist I/O and direct access files
>From: Ken Plotkin <kplotkin@nospam-cox.net>
>Organization: Guybrush Threepwood Fan Club
>Date: Sun, 28 Mar 2004 02:21:48 -0500
>Newsgroups: comp.lang.fortran
>
>On Sun, 28 Mar 2004 01:19:58 -0500, Madhusudan Singh
><spammers-go-here@yahoo.com> wrote:
>
>
>
>Machine reading or human reading?
>
>I have never used namelist, but I understand that it makes the human
>side of preparing and browsing data files easier, if more verbose.
>But it should be slower than simpler read/write.
>
>If you want to be able to browse your data dump file, you should not
>use unformatted - unless you're skilled at reading hex dumps, like the
>characters in "The Matrix." James has pointed out that "unformatted"
>in your open is not compatible with the rest - which is what I
>figured.
>
>If speed of writing/reading the dump matters, then simple sequential
>"unformatted" (or "stream"/"binary"/"transparent", if supported by
>your compiler) is the fastest, simplest way.


The original intent was to protect against interruptions. In such a
situation any i/o will be slow compared to the main computation.
A serious concern is whether the interruption happens during the
save and results in a partial save. One solution is to alternate
between two save files with suitable logic to determine if the save
was complete and which save is the later one when doing the restore.

Having a corrupt save file is a BIG nuisance!

One advantage of human readable save files is that you can change
the contents (eg how often to save) before doing the restore.

>Ken Plotkin
>





Gary L. Scott

2004-03-28, 10:29 pm

Richard Maine wrote:
>
> Madhusudan Singh <spammers-go-here@yahoo.com> writes:
>
>
> I'm not quite sure what direct access has to do with saving disk
> space or reducing time taken. If you want to randomly access
> some part of the file, then direct access has a large effect on the
> time, yes.


I've never quite understood how direct access guarantees anything about
access time. Can't the processor just simulate all of the implied
functionality with a regular sequential file and backspaces and rewinds
and counting records? Does direct access force some sort of special
directory or file structure (I know fixed size records, but don't you
still have to increment the file position pointer on a byte or word
basis?? Do file systems allow direct "block" or "sector" position jumps
(I assume so, but never studied)?)? P.S. on some older systems a
"sector" is synonymous with a "block" so I'm not quite sure what I'm
saying here relative to current systems.


> It isn't clear that you are doing that. And I can't
> figure out the correlation with disk space at all. I suppose you
> save a trivial amount in header size, but that is likely to be
> orders of magnitude less than it costs to pad all records to the same
> size if they aren't inherently already the same size.
>
>
> That isn't allowed...and doesn't make any sense.
>
> The io length inquire is only relevant for unformatted I/O. It is
> necessary because the units of length are unspecified for unformatted
> I/O. For formatted I/O, the units are specified to be characters,
> so no need for any such construct.
>
> Also, I see a huge disconnect here. You initially said that you were
> using direct access to save time and space. Although I don't see
> how direct access helps that, I at least take the message that you
> think it important to minimize time and space. But...
> namelist is probably the single *WORST* option from both time and
> space perspectives. This just does not make sense.
>
>
> That has nothing to do with it. The rec= is *REQUIRED* for direct
> access, period. Even if there is only one record, you still have
> to specify the rec=1.
>
>
> Guess I can tell. :-)
>
> You really don't want direct access here. Looking at the code, it just
> isn't doing anything for you; all it is doing is causing problems and
> it isn't helping anything.
>
> Namelist is ok for the purpose, but
>
> 1. Namelist is for formatted, sequential I/O - not unformatted
> direct access.
>
> 2. Namelist is slow and space-consuming instead of fast and
> compact. But do you really care? Unless you've got some
> huge arrays burried in there or unless you do this in the
> middle of some time-critical loop, I can't see how either the
> space or time would be at issue.
>
> Use namelist if you want a simple way to write a file that will
> be easy for a human to read.
>
> --
> Richard Maine
> email: my last name at domain
> domain: sumertriangle dot net



--

Gary Scott
mailto:garyscott@ev1.net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.

Democracy is two wolves and a sheep, voting on what to eat for dinner...
Liberty is a well armed sheep contesting the vote. - Thomas Jefferson
Richard Maine

2004-03-28, 10:29 pm

"Gary L. Scott" <garyscott@ev1.net> writes:

> Richard Maine wrote:


> I've never quite understood how direct access guarantees anything about
> access time.


It doesn't guarantee anything. But it pretty explicitly puts the vendor
on notice that random access is likely. And it has limits explicitly
designed to make fast random access practical, so... in actual
practice, random access is fast. No esoteric theory involved here.
Trust me, that it is a *VERY* easy to measure. I have files that
appraoch 2 gb (I still conscientiously stay under that limit for
portability, but sometimes just varely under). It is not all all
unusual to want to get a small portion from near the end of such
a file. And it isn't difficult at all to see the difference between doing
so sequentially or randomly.

> Can't the processor just simulate all of the implied
> functionality with a regular sequential file and backspaces and rewinds
> and counting records?


Yes, a processor *CAN* and even may implement direct access this way.
I've seen it done. However, it isn't the usual choice. (Mostly where
I've seen direct access done like that is where a processor supports
creating a file as sequential, but then accessing it as direct access,
handling the variable length records.)

> I know fixed size records, but don't you
> still have to increment the file position pointer on a byte or word
> basis??


NO! NO! NO!

> Do file systems allow direct "block" or "sector" position jumps


Absolutely yes. Almost more the other way - that they tend to be
written in terms of block access, and it is just "handy" if two
blocks happen to be accessed in sequence. (Well, the speed
issues of adjacent disk block access are so "handy" that some
file systems go to quite a lot of trouble to optimize such things,
but that's at a deeper level than I normally play.)

--
Richard Maine
email: my last name at domain
domain: sumertriangle dot net
Madhusudan Singh

2004-03-28, 10:29 pm

On Sun, 28 Mar 2004 09:56:26 -0800, Richard Maine wrote:

> Madhusudan Singh <spammers-go-here@yahoo.com> writes:
>
>


Thanks for your detailed response and for bearing with someone who is new
to namelist and unformatted I/O.

> I'm not quite sure what direct access has to do with saving disk
> space or reducing time taken. If you want to randomly access


According to an example on Pg 572 of Chapman's Fortran 90/95 for
Scientists and Engineers, unformatted files are much smaller than their
formatted counterparts.

I also read on Pg 569 that :

"Direct access unformatted files whose record length is a multiple of the
sector size of a particular computer are the most efficient Fortran files
possible on that computer." (verbatim).

> some part of the file, then direct access has a large effect on the
> time, yes. It isn't clear that you are doing that. And I can't figure


Correct. All I am trying to accomplish is the following :

1. At regular intervals, dump some variables to a file on the disk so that
my long Monte Carlo simulation can be restarted from where it left off.
2. At the beginning of each run, the program checks for the existence of
the file vardump.nml so that it can detect an interruption in a prior run.
3. I want the file to be as small as possible and as quick in I/O as
possible. Since, I would necessarily be accessing this file in a
sequential fashion (when restoring after an interruption), I guess the
speed gain does not matter (as you hint above).
4. The file vardump.nml does not have to be human readable. So the use of
an unformatted file is possible, and considering 3, even imperative.
5. I do not want to introduce any truncation or rounding errors through
file I/O alone.


>
> That isn't allowed...and doesn't make any sense.


I suspected as much, but I guess I was pushing my luck !

I came back to this part after reading the rest of your post. The message
is that namelist directed unformatted I/O is a contradiction in terms. Ok.
So, if I choose to use unformatted I/O, would the following statements be
ok :

open(unit=666,file='vardump. unf',access='direct',form='unformatted',
recl=reclength)

(direct access implies unformatted by default, but I am just making things
extra explicit here)

Now reclength would presumably be different for different types and
arrays of different sizes, etc. Or can one dump only variables of a
single kind through direct access ?

If not, how does one specify a unique value of reclength above ?

( I do not expect inquire(iolength=reclength)
a_single_double_precision_number and inquire(iolength=reclength)
a_single_integer to return the same values for reclength). However, I
remember reading that reclength is a processor dependent unique number. I
know from my earlier work that storing a real (for instance, IEEE
754) takes more bits than storing an integer does. So how can this be ? Or
is it that a character is stored in one record, an integer in n records,
and a double precision value in m records, where m>n ?

In an example on page 571, Chapman uses a real array in the inquire
statement, suggesting that somehow an array *could* return a different
record length - multiples of a single real reclength, perhaps. I guess my
confusion on this point really shows :)

> direct access helps that, I at least take the message that you think it
> important to minimize time and space. But... namelist is probably the
> single *WORST* option from both time and space perspectives. This just
> does not make sense.


Thanks for pointing that out. I wanted to use a namelist because it
considerably simplifies my coding. If I need a variable to be backed up,
I just add it to the namelist and let the write statement figure out how
it wishes to handle it. And use unformatted I/O. Such delusions are cured
now :)


>
>
> Guess I can tell. :-)
>
> You really don't want direct access here. Looking at the code, it just
> isn't doing anything for you; all it is doing is causing problems and it
> isn't helping anything.
>
> Namelist is ok for the purpose, but
>
> 1. Namelist is for formatted, sequential I/O - not unformatted
> direct access.


I guess that settles it. Ease of coding vs. saving disk space. I think I
will go for the latter.

>
> 2. Namelist is slow and space-consuming instead of fast and
> compact. But do you really care? Unless you've got some huge arrays
> burried in there or unless you do this in the middle of some
> time-critical loop, I can't see how either the space or time would be
> at issue.


Well, I have something like 10-15 real (double precision) arrays each with
4000+ entries. So, I think space could be a problem. Some of the machines
I run this program on, have relatively limited diskspace.

>
> Use namelist if you want a simple way to write a file that will be easy
> for a human to read.


I guess I was trying to have the best of both worlds. Ease of coding as
well savings in disk space.
Madhusudan Singh

2004-03-28, 10:29 pm

On Sun, 28 Mar 2004 02:21:48 -0500, Ken Plotkin wrote:

> On Sun, 28 Mar 2004 01:19:58 -0500, Madhusudan Singh
> <spammers-go-here@yahoo.com> wrote:
>
>

Thanks for your response.
[color=darkred]
>
> Machine reading or human reading?


I am not interested in human readable files for this purpose.

>
> I have never used namelist, but I understand that it makes the human
> side of preparing and browsing data files easier, if more verbose.
> But it should be slower than simpler read/write.


That is what is the message from Richard's exhaustive post in this thread.

>
> If you want to be able to browse your data dump file, you should not
> use unformatted - unless you're skilled at reading hex dumps, like the
> characters in "The Matrix." James has pointed out that "unformatted"
> in your open is not compatible with the rest - which is what I
> figured.


Well, as I said, I am just interested in using a small file that my
program can recognize when it needs to restart.

>
> If speed of writing/reading the dump matters, then simple sequential
> "unformatted" (or "stream"/"binary"/"transparent", if supported by your
> compiler) is the fastest, simplest way.


I guess so.

Madhusudan Singh

2004-03-28, 10:29 pm

On Sun, 28 Mar 2004 06:43:43 +0000, James Van Buskirk wrote:

> "Madhusudan Singh" <spammers-go-here@yahoo.com> wrote in message
> news:c45qqf$2flsg2$1@ID-159130.news.uni-berlin.de...
>
>
> open(unit=666,file='vardump. nml',access='direct',form='unformatted',
delim='a
> postrophe',recl=reclength)
>
> The DELIM= specifier is permitted only for a file being connected
> for formatted input/output. (ISO/IEC 1539-1:1997(E) Section 9.3.4.9)
>
> If the data transfer statement contains a format or namelist-group-
> name, the statement is a formatted input/output statement; otherwise
> it is an unformatted input/output statement. (ibid. Section 9.4.1)
>
> You seem to be mixing formatted output (NAMELIST and DELIM=) with
> unformatted output (FORM='UNFORMATTED'.) I don't think you can
> get NAMELIST to input/output the internal representation of your
> data as unformatted input/output would do.



Thanks for pointing that out. As a newbie to these features, I was trying
to have the best of both worlds - ease of coding with namelists and save
space with unformatted files :)

Madhusudan Singh

2004-03-28, 10:30 pm

On Sun, 28 Mar 2004 14:07:41 -0400, Gordon Sande wrote:


>
> The original intent was to protect against interruptions. In such a
> situation any i/o will be slow compared to the main computation.
> A serious concern is whether the interruption happens during the
> save and results in a partial save. One solution is to alternate
> between two save files with suitable logic to determine if the save
> was complete and which save is the later one when doing the restore.
>
> Having a corrupt save file is a BIG nuisance!
>
> One advantage of human readable save files is that you can change
> the contents (eg how often to save) before doing the restore.


Thanks for pointing that issue out. I had not considered this case. Maybe
the way is to write number of records required written in another file.
Then execute the save. Then append the number of records actually written
to the first file. When restoring, compare the two numbers. If the second
one is missing, one knows the save was incomplete.

An improvement can be to have two running variable dumps, not one. So that
if the newer dump was interrupted, one can go back to the previous dump :)

Atomicity is important.
Ken Plotkin

2004-03-28, 10:30 pm

On Sun, 28 Mar 2004 15:52:08 -0500, Madhusudan Singh
<spammers-go-here@nowhere.now> wrote:

[snip]
>I also read on Pg 569 that :
>
>"Direct access unformatted files whose record length is a multiple of the
>sector size of a particular computer are the most efficient Fortran files
>possible on that computer." (verbatim).


There is a good potential for that, but it depends on the
implementation. For one thing, the Fortran I/O system would have to
know the sector size of the disk to which it is writing - and that can
be different for each disk on a system.

BTW - notwithstanding Richard's experience about the efficiency of
direct access files, I have encountered one compiler where direct
access was abysmally slow. It was, in fact, faster to open a file as
sequential and read past everything than to use direct access.

[snip]
>5. I do not want to introduce any truncation or rounding errors through
>file I/O alone.


Another point in favor of unformatted - you get a direct binary
transfer of what's in memory.

[snip]
>open(unit=666,file='vardump. unf',access='direct',form='unformatted',
recl=reclength)
>
>(direct access implies unformatted by default, but I am just making things
>extra explicit here)


That's not correct. Direct access can apply to either unformatted or
formatted files.

>
>Now reclength would presumably be different for different types and
>arrays of different sizes, etc. Or can one dump only variables of a
>single kind through direct access ?
>
>If not, how does one specify a unique value of reclength above ?


Unless you plan on accessing a record in the middle of your file, I
would recommend not doing direct access. Figuring record length, then
dealing with a particular value, can get annoying.

[snip]
>...I wanted to use a namelist because it
>considerably simplifies my coding. If I need a variable to be backed up,
>I just add it to the namelist and let the write statement figure out how
>it wishes to handle it. And use unformatted I/O. Such delusions are cured
>now :)


With a plain unformatted write, you have your write list in just three
places: the two alternating writes (Gordon's point about that is
excellent) and the read. You could use an include file to make sure
you spell it identically in all three places:

write(666)
include 'mystuff.inc'

and mystuff.inc consists of:

+ a,b,c,d,
+ f,g,h, etc.

[snip]
>Well, I have something like 10-15 real (double precision) arrays each with
>4000+ entries. So, I think space could be a problem. Some of the machines
>I run this program on, have relatively limited diskspace.


Wow!! For efficiency, do your unformatted write all at once. My
experience has been that

write(13) a,b

is faster than

write(13) a
write(13) b

and not just on the rogue compiler with the snail-like direct
performance. Unformatted, the second form will use more disk space,
since unformatted generally puts record size information into each
write.

Ken Plotkin

Madhusudan Singh

2004-03-28, 10:30 pm

On Sun, 28 Mar 2004 16:34:48 -0500, Ken Plotkin wrote:

>
> There is a good potential for that, but it depends on the
> implementation. For one thing, the Fortran I/O system would have to
> know the sector size of the disk to which it is writing - and that can
> be different for each disk on a system.
>
> BTW - notwithstanding Richard's experience about the efficiency of
> direct access files, I have encountered one compiler where direct
> access was abysmally slow. It was, in fact, faster to open a file as
> sequential and read past everything than to use direct access.


Which compiler was this ? I use Sun f90 and IFC, and will probably add
NAGware to that list in a few months, so I should know what to watch out
for :)

>
> [snip]
>
> Another point in favor of unformatted - you get a direct binary transfer
> of what's in memory.


Precisely, which is why I want to use unformatted I/O.

>
> [snip]
>
> That's not correct. Direct access can apply to either unformatted or
> formatted files.


I know that. I just said that direct access implies unformatted by default
(according to Chapman Pg. 569.).

>
>
>
> Unless you plan on accessing a record in the middle of your file, I
> would recommend not doing direct access. Figuring record length, then
> dealing with a particular value, can get annoying.
>


So, I should go for sequential unformatted I/O ?


>
> Wow!! For efficiency, do your unformatted write all at once. My
> experience has been that
>
> write(13) a,b
>
> is faster than
>
> write(13) a
> write(13) b


That is excellent advice. Do I take it that 'a', 'b' above can be arrays
(subscripted or otherwise) ?

glen herrmannsfeldt

2004-03-28, 10:30 pm

Richard Maine wrote:

> "Gary L. Scott" <garyscott@ev1.net> writes:

(snip)

> It doesn't guarantee anything. But it pretty explicitly puts the vendor
> on notice that random access is likely. And it has limits explicitly
> designed to make fast random access practical, so... in actual
> practice, random access is fast. No esoteric theory involved here.
> Trust me, that it is a *VERY* easy to measure. I have files that
> appraoch 2 gb (I still conscientiously stay under that limit for
> portability, but sometimes just varely under). It is not all all
> unusual to want to get a small portion from near the end of such
> a file. And it isn't difficult at all to see the difference between doing
> so sequentially or randomly.


Well, with sequential files, BACKSPACE and REWIND it is usual,
if not required, that a WRITE erases, or makes unavailable,
everything after that point in the file. To me the main
point of direct access is that it doesn't do that.

[color=darkred]
> Yes, a processor *CAN* and even may implement direct access this way.
> I've seen it done. However, it isn't the usual choice. (Mostly where
> I've seen direct access done like that is where a processor supports
> creating a file as sequential, but then accessing it as direct access,
> handling the variable length records.)



On IBM S/360, the first system where I knew Fortran
DIRECT ACCESS files (using the DEFINE FILE statement)
only fixed length records are allowed. The file is then
written onto the disk with physical disk blocks of that
size. Unlike popular Unix and DOS/windows disks, IBM
S/360, and successors do not use fixed 512 byte blocks.

A direct access READ or WRITE then translates to an
I/O operation to the hardware to read or write that block.
File pointers are in terms of cylinders, tracks, and record
within the track, not byte offsets.

On Unix/DOS/Windows it likely multiplies the record number
by the record length and asks the OS for bytes at that location.
The OS will then read the appropriate disk blocks, and supply
the requested bytes.
[color=darkred]
> NO! NO! NO!


[color=darkred]
> Absolutely yes. Almost more the other way - that they tend to be
> written in terms of block access, and it is just "handy" if two
> blocks happen to be accessed in sequence. (Well, the speed
> issues of adjacent disk block access are so "handy" that some
> file systems go to quite a lot of trouble to optimize such things,
> but that's at a deeper level than I normally play.)


Systems with a disk cache, and NFS servers, often read ahead,
assuming that the following blocks will be read soon. Since
file systems with 512 byte physical blocks are often used with
other sized logical blocks, that still makes sense.

-- glen




glen herrmannsfeldt

2004-03-28, 10:30 pm

Ken Plotkin wrote:

(big snip)

> Wow!! For efficiency, do your unformatted write all at once. My
> experience has been that


> write(13) a,b


> is faster than


> write(13) a
> write(13) b


> and not just on the rogue compiler with the snail-like direct
> performance. Unformatted, the second form will use more disk space,
> since unformatted generally puts record size information into each
> write.


Well, if a and b are scalar variables, then yes.

If a and b are large arrays, then it may not be much different.

It might even be that some systems have a limit to the record
length, and that large arrays could exceed that limit.

I do sometimes like to do something like:

WRITE(13) N,(A(I),I=1,N)

which allows arrays of unknown length to be written, though
can be dangerous if the file contents are not known. Also,
the array on read must be at least big enough to hold the
data but there is no chance to check it.

-- glen

glen herrmannsfeldt

2004-03-28, 10:30 pm

Madhusudan Singh wrote:

(snip)

> Precisely, which is why I want to use unformatted I/O.


[color=darkred]
[color=darkred]
> I know that. I just said that direct access implies unformatted by default
> (according to Chapman Pg. 569.).


And NAMELIST only works on FORMATTED files.

(UNFORMATTED and FORMATTED are the Fortran keywords for what are more
commonly called BINARY and TEXT files.)

I could imagine using NAMELIST on FORMATTED direct access files, but
I don't believe anyone ever implemented it. If would be a little
strange, as the record length is usually not known for NAMELIST, so
one would have to select the maximum possible length. It would then be
inefficient for most cases.

[color=darkred]

You select the maximum, or break up the I/O into multiple records
of the appropriate length. I have known systems to do checkpointing
with direct access files including a directory structure, and writing
the data into multiple records as appropriate.
[color=darkred]

Well, one could use direct access, allowing multiple checkpoint
data sets in one file. The program could write over older ones
while keeping the last N for later checking. It is possible that
the program might fail in the middle of writing, maybe due to
power outage or the program being canceled. One should then have
at least two saved copies.
[color=darkred]
> So, I should go for sequential unformatted I/O ?


I think so. You could write two (or more) files, to keep
multiple checkpoint sets.

[color=darkred]
[color=darkred]
[color=darkred]
[color=darkred]
[color=darkred]
> That is excellent advice. Do I take it that 'a', 'b' above can be arrays
> (subscripted or otherwise) ?


As I previously wrote, yes, they can be arrays. You can also do

WRITE(13) N,(A(I),I=1,N),(B(I),B=1,N)

-- glen

Madhusudan Singh

2004-03-28, 10:30 pm

On Mon, 29 Mar 2004 01:43:22 +0000, glen herrmannsfeldt wrote:


>
>
>
>
>
>
> As I previously wrote, yes, they can be arrays. You can also do
>
> WRITE(13) N,(A(I),I=1,N),(B(I),B=1,N)


And for sequential unformatted I/O, I do not need to worry about record
lengths, right ? Some of my arrays are really big - double precision with
4000+ elements.
Ken Plotkin

2004-03-28, 10:30 pm

On Sun, 28 Mar 2004 18:55:46 -0500, Madhusudan Singh
<spammers-go-here@nowhere.now> wrote:


>Which compiler was this ? I use Sun f90 and IFC, and will probably add
>NAGware to that list in a few months, so I should know what to watch out
>for :)


Microsoft Powerstation 1.0a. Don't worry - you won't run into it.
It's been out of production for a while, and ran under DOS.

>So, I should go for sequential unformatted I/O ?


That's what I've been recommending. The basic premise is to keep
things simple.

>
>That is excellent advice. Do I take it that 'a', 'b' above can be arrays
> (subscripted or otherwise) ?


Yes. Pay attention to Glen's cautions, though. And run a few tests
before committing to anything in your big simulation program.

Ken Plotkin
Richard Maine

2004-03-29, 1:31 am

Madhusudan Singh <spammers-go-here@nowhere.now> writes:

> According to an example on Pg 572 of Chapman's Fortran 90/95 for
> Scientists and Engineers,


My least favorite of all Fortran books. :-( And one of the reasons
is that it gives lots of advice I disagree with. It sometimes
phrases that advice in ways that make it sound like universally
agreed principles, or even requirements of the standard. But...

> unformatted files are much smaller than their
> formatted counterparts.


Generally true. But this isn't caused by magic. You can't
just open with form='unformatted' and make a file smaller.
There's no "rocket science here". It is just a matter of
comparing, for example, the 4 bytes that a sing;e precision
real takes in internal storage (and thus also on an unformatted file)
versus the dozen or so characters, depending on the exact format, that
it will take to write out that real in a character form. Namelist
is inherently formatted...and will add the extra space to write
out the variable name.

> I also read on Pg 569 that :
>
> "Direct access unformatted files whose record length is a multiple of the
> sector size of a particular computer are the most efficient Fortran files
> possible on that computer." (verbatim).


Typical of that book in being a bit oversimplified. Efficiency
is not a simple subject. If you are reading the file sequentially,
direct access may well not be as efficient as sequential. In any
case, I doubt that one could measure the difference for what you
are doing. I think you need to be more concerned about just getting
something that works than with getting the last bit of efficiency.
If you really needed to worry about efficiency at that level, you'd
need to do actual tests instead of relying on advice from a book
(even if it was a book whose advice I respected).

But again, you don't *HAVE* direct access unformatted data. You
don't have anything close to it. Just opening the file that way
won't make your data fit that.

> 5. I do not want to introduce any truncation or rounding errors through
> file I/O alone.


Oh. I didn't pick up on that. Then you better avoid formatted I/O.
It is very tricky to do formatted I/O without rounding errors.

From everything you've described, namelist just makes no sense.
It doesn't fit your application. I recommend forgetting about it.

> So, if I choose to use unformatted I/O, would the following statements be
> ok :
>
> open(unit=666,file='vardump. unf',access='direct',form='unformatted',
recl=reclength)
>
> (direct access implies unformatted by default, but I am just making things
> extra explicit here)


I agree with that preference.

> Now reclength would presumably be different for different types and
> arrays of different sizes, etc. Or can one dump only variables of a
> single kind through direct access ?


I think you want to forget the direct access also. You are paying
too much attention to that bit about it being the most efficient
form. That isn't always true, and in any case, the difference is
likely to me minuscule.

> remember reading that reclength is a processor dependent unique number. I


No. The *UNITS* of reclen are processor-dependent. You specify what
record length you want. The processor-dependent thing is whether you
specify thislength in terms of 8-bit bytes (the most common),
32-bit words (next most common), or other units (not so common any
more).

But force-fitting data into fixed length records is a big pain.
It can be done. I've done it. You can just regard the records
as bufers for storing bits, doing your own record management at
another level of abstraction. But I'm *NOT* going to help someone
through that who doesn't really need it. That would be too painful
for both of us. Its a lot of bother. And you don't really need it.

Just use unformatted sequential. It seems the perfect match for
you needs. It is much simpler. You won't notice the performance
difference. Heck, if anything, I'd bet performance will be better;
using something well-matched to the purpose often is good for
oerformance.

> know from my earlier work that storing a real (for instance, IEEE
> 754) takes more bits than storing an integer does.


That's not in general true. There are many different sizes of
integers and reals. In fact, the Fortran standard *REQUIRES* that
default integers and reals take the same space; for nondefault ones,
matters are different. But this turns out to be a side issue for
your application. Once you forget about using direct access, you
won't need to pay attention to this.

> Thanks for pointing that out. I wanted to use a namelist because it
> considerably simplifies my coding. If I need a variable to be backed up,
> I just add it to the namelist and let the write statement figure out how
> it wishes to handle it. And use unformatted I/O. Such delusions are cured
> now :)


Just plain old ordinary simple sequential unformatted will do this
with no extra complication in the coding. I think you are seeing
problems that aren't there. What namelist does is add things
like x= (if x is the variable name) to the output. You don't
need that. If you just write x, and then later read x, it will
do what you want.

> Well, I have something like 10-15 real (double precision) arrays each with
> 4000+ entries. So, I think space could be a problem. Some of the machines
> I run this program on, have relatively limited diskspace.


That sounds like 4000*15*8 = about half a megabyte. In this day
and age, that's not relatively limitted. That's half of a floppy.
Unless perhaps you are writing a lot of records like this. From part
of your description, that might be.

> I guess I was trying to have the best of both worlds. Ease of coding as
> well savings in disk space.


See above. Namelist isn't really helping the ease of coding either.
Really, there is not a single property of namelist that is appropriate
to your task.

--
Richard Maine
email: my last name at domain
domain: sumertriangle dot net
glen herrmannsfeldt

2004-03-29, 4:41 am

Richard Maine wrote:

> Madhusudan Singh <spammers-go-here@nowhere.now> writes:


(snip)


My favorite saying, though I don't use it very often:

"All generalizations are false, including this one."
[color=darkred]
> Generally true. But this isn't caused by magic. You can't
> just open with form='unformatted' and make a file smaller.
> There's no "rocket science here". It is just a matter of
> comparing, for example, the 4 bytes that a sing;e precision
> real takes in internal storage (and thus also on an unformatted file)
> versus the dozen or so characters, depending on the exact format, that
> it will take to write out that real in a character form. Namelist
> is inherently formatted...and will add the extra space to write
> out the variable name.


NAMELIST input tends to be more space efficient in many
cases, as you can give variables default values and only change
the ones that need to be changed. NAMELIST output I never
found very useful, it is just too much work.

WRITE(6,*) 'X=',X

is much easier than defining a NAMELIST, and then using it.
(Consider for debugging purposes.)

(I will keep the PL/I references low, but it is just too obvious here:
The PL/I equivalent is PUT DATA(X); No separate NAMELIST needed,
just the variables desired, very convenient for debugging.
Even more fun, the default if no variable list is specified is all
variables in the program unit, for either input or output.)

[color=darkred]

[color=darkred]
> Typical of that book in being a bit oversimplified. Efficiency
> is not a simple subject. If you are reading the file sequentially,
> direct access may well not be as efficient as sequential. In any
> case, I doubt that one could measure the difference for what you
> are doing. I think you need to be more concerned about just getting
> something that works than with getting the last bit of efficiency.
> If you really needed to worry about efficiency at that level, you'd
> need to do actual tests instead of relying on advice from a book
> (even if it was a book whose advice I respected).


> But again, you don't *HAVE* direct access unformatted data. You
> don't have anything close to it. Just opening the file that way
> won't make your data fit that.


There are times when it is useful, but they are rare. The one case
I know pretty much defined a file system inside a direct access file
for use by the program. Some blocks were like directory blocks,
others were data blocks. If there is no natural size to the
data, you might just as well pick a nice number. Though on systems
that add block headers and trailers, you would have to include that.

[color=darkred]
> Oh. I didn't pick up on that. Then you better avoid formatted I/O.
> It is very tricky to do formatted I/O without rounding errors.


Java was designed to do it, and I hate it. The default conversions
are guaranteed to supply enough digits that when converted back again
you get the original bits. (IEEE is required.) That means a high
probability of output like 1.999999999999999 when everyone knows the
answer is 2, and list directed output in all other languages say 2.

> From everything you've described, namelist just makes no sense.
> It doesn't fit your application. I recommend forgetting about it.


(snip)

[color=darkred]
> I think you want to forget the direct access also. You are paying
> too much attention to that bit about it being the most efficient
> form. That isn't always true, and in any case, the difference is
> likely to me minuscule.


Yes. But anyway, you can READ and WRITE all types of variables. I am
not sure what special rules might apply to CHARACTER variables, but as
far as I know the whole thing gets written out.

[color=darkred]
> No. The *UNITS* of reclen are processor-dependent. You specify what
> record length you want. The processor-dependent thing is whether you
> specify thislength in terms of 8-bit bytes (the most common),
> 32-bit words (next most common), or other units (not so common any
> more).


Well, as the sizes of the types could be processor dependent, so
would the length if it is in bytes. With the popularity of IEEE
floating point and 32 bit integers that isn't likely.

> But force-fitting data into fixed length records is a big pain.
> It can be done. I've done it. You can just regard the records
> as bufers for storing bits, doing your own record management at
> another level of abstraction. But I'm *NOT* going to help someone
> through that who doesn't really need it. That would be too painful
> for both of us. Its a lot of bother. And you don't really need it.


If you are lucky, all records are the same size. I was just trying
to remember the last time I used Fortran direct access I/O, except
that I think it didn't actually use it. About 1978 I ported the BASIC
program called PCAVES (public caves) to PDP-10 Fortran. I might have
had assembler programs to read and write disk blocks because PDP-10
Fortran didn't have direct access. Also, assembler programs to do
record locking, so that multiple copies could run without trashing
the file. It really isn't used often, but when it is you will know
it because it is the only solution to the problem.

> Just use unformatted sequential. It seems the perfect match for
> you needs. It is much simpler. You won't notice the performance
> difference. Heck, if anything, I'd bet performance will be better;
> using something well-matched to the purpose often is good for
> oerformance.




[color=darkred]
> Just plain old ordinary simple sequential unformatted will do this
> with no extra complication in the coding. I think you are seeing
> problems that aren't there. What namelist does is add things
> like x= (if x is the variable name) to the output. You don't
> need that. If you just write x, and then later read x, it will
> do what you want.


There is one advantage to NAMELIST, but it isn't hard to fix. With
NAMELIST your files are backward compatible if you need to add
variables, assuming you can give them default values. The fix is
to add a version number at the beginning of the file. The first thing
you write, and also the first thing you read is the version number.
You can then verify that the data you are reading is what you are
supposed to be reading, and complain if it is wrong. Sometimes
it is worthwhile to have the reader recognize older file versions,
and read them, otherwise print a message and end.

[color=darkred]
> That sounds like 4000*15*8 = about half a megabyte. In this day
> and age, that's not relatively limitted. That's half of a floppy.
> Unless perhaps you are writing a lot of records like this. From part
> of your description, that might be.


It might be better not to write records that big.
(Each WRITE writes a new record.) Though others will say what
good and bad sizes are.

(snip)

-- glen

Keith Lindsay

2004-03-29, 12:34 pm

Madhusudan Singh wrote:
> Correct. All I am trying to accomplish is the following :
>
> 1. At regular intervals, dump some variables to a file on the disk so

that
> my long Monte Carlo simulation can be restarted from where it left off.


You should be aware that on some platforms, namelist output does not
provide enough digits to exactly recover the value when it is read back
in. That is, if you write out a value and read it back in, the read in
value may differ slightly from the original. The effective perturbation
may or may not be significant in your application.

Keith

Richard Maine

2004-03-29, 12:34 pm

glen herrmannsfeldt <gah@ugcs.caltech.edu> writes:

> I was just trying
> to remember the last time I used Fortran direct access I/O,


I use it quite regularly for 2 reasons (neither of which are
applicable to the OP here).

1. "Portability" and access to non-Fortran files. ALthough the
standard doesn't require it, implementations of unformatted
direct access tend to write data with non compiler-dependent
record headers or trailers. Those implementations that don't
do it by default typically can be convinced to do it that way
with some compiler switch. In practice, every full-language
compiler that I've had occasion to try in a long time could
be convinced to work this way. The only recent exception I recall
is the subset ELF90 (which omits the switch that its "grown up"
full language sibling has).

Thus, unformatted direct access is the my preferred way to
deal with non-Fortran formats that don't have Fortran record
formats.

Similarly, I use it for file formats that I design for portability
between different machines. Of course, formatted sequential is
the most portable, but sometimes that just isn't acceptable. I
started using direct access unformatted files for this about 2
decades ago, when I got user complaints that the formatted files
I had been using for transfer between machines were just too big
and slow. Yes, you have to worry about byte sex, but that's not
difficult. Handling conversion between different native floating
point formats isn't to difficult either - I've done it.

The one big thing you give up is any pretext of being portable to
machines that don't have 8-bit bytes (or anyway a power of 2).
Using those files on something like an old 60-bit CDC is just
something I wouldn't want to deal with. I won't say it couldn't
be done, but I wouldn't want to be the one to do it. Pain.

With f2003, stream I/O will generally be a better way to do this.
It avoids lots of complications that direct access unformatted
has for this. (One of the larger of those complications is the
fixed record-length thing.) You can do stream I/O today, but
it is compiler-dependent. Sometimes, if portability to other
compilers isn't required, I've even been known to suggest the
compiler-dependnet forms of stream I/O rather than have relative
novices deal with the complications of managing their own fixed-length
buffers.

2. The other place I use them is for files that I actually want the
direct access feature. This is common for flight research data.
We might have an hour of data from a flight, but want to extract
only a 10-second segment for some particular analysis.

--
Richard Maine
email: my last name at domain
domain: sumertriangle dot net
Tim Prince

2004-03-29, 11:32 pm


"Richard Maine" <nospam@see.signature> wrote in message
news:m2hdw8jj6p.fsf@vega.dsl.att.net...
>
> Just use unformatted sequential. It seems the perfect match for
> you needs. It is much simpler. You won't notice the performance
> difference. Heck, if anything, I'd bet performance will be better;
> using something well-matched to the purpose often is good for
> oerformance.
>

I once cut run time of a program from 8 to 6 hours, when I noticed that an
unformatted direct access file was written and read always in sequential
order. That system performed asynchronous operations automatically on
sequential files.
It's doubly strange that author would say direct access is faster, after
noticing how dependent it may be on matching block sizes with the platform
of the moment.


Clive Page

2004-03-30, 2:39 pm

In message <Y%5ac.16355$Xl2.10841@newssvr27.news.prodigy.com>, Tim
Prince <tprince@computer.org> writes
>It's doubly strange that author would say direct access is faster, after
>noticing how dependent it may be on matching block sizes with the platform
>of the moment.


I believed for some years that direct-access would be faster if the
record-size was 512 bytes or some nice multiple of it, as that seems so
logical. Then having recommended it to a colleague, and feeling guilty
that I really didn't know if it would speed up his program, I tried it
on a couple of platforms with records of 511, 512, 513, and similarly
around 1024 and 2048. There was a tiny difference, but really not worth
bothering about. I suspect it was once true, but modern disc systems
and operating systems use such clever buffering that it's no longer a
useful measure.

It wasn't even all that inefficient to use direct-access FORMATTED I/O
with a record length of 1, which gives you a byte-by-byte access to a
file (on most systems) rather like the new-fangled stream access.

--
Clive Page
Gary L. Scott

2004-03-30, 5:45 pm

Clive Page wrote:
>
> In message <Y%5ac.16355$Xl2.10841@newssvr27.news.prodigy.com>, Tim
> Prince <tprince@computer.org> writes
>
> I believed for some years that direct-access would be faster if the
> record-size was 512 bytes or some nice multiple of it, as that seems so
> logical. Then having recommended it to a colleague, and feeling guilty
> that I really didn't know if it would speed up his program, I tried it
> on a couple of platforms with records of 511, 512, 513, and similarly
> around 1024 and 2048. There was a tiny difference, but really not worth
> bothering about. I suspect it was once true, but modern disc systems
> and operating systems use such clever buffering that it's no longer a
> useful measure.
>
> It wasn't even all that inefficient to use direct-access FORMATTED I/O
> with a record length of 1, which gives you a byte-by-byte access to a
> file (on most systems) rather like the new-fangled stream access.


My, not in my experience. I've seen differences between "binary" and
reading 1-byte direct access of factors of 10's and 100's for large
files. Probably insignificant for small files though.

>
> --
> Clive Page



--

Gary Scott
mailto:garyscott@ev1.net

Fortran Library: http://www.fortranlib.com

Support the Original G95 Project: http://www.g95.org
-OR-
Support the GNU GFortran Project: http://gcc.gnu.org/fortran/index.html

Why are there two? God only knows.

Democracy is two wolves and a sheep, voting on what to eat for dinner...
Liberty is a well armed sheep contesting the vote. - Thomas Jefferson
Richard Maine

2004-03-30, 5:45 pm

"Gary L. Scott" <garyscott@ev1.net> writes:

> Clive Page wrote:


>
> My, not in my experience. I've seen differences between "binary" and
> reading 1-byte direct access of factors of 10's and 100's for large
> files. Probably insignificant for small files though.


Yes. Clive's comment so surprised me that I was tempted to throw
together a quick test to see if things had changed while I wasn't
looking. The factors of 10's to 100's sound more like what I'd
expect and recall.

You really claim that the difference between, say

real :: x(1000)
...
read(lun,rec=n) x

versus

character :: c
...
do i = 1 , 4000
...
read(lun,'(a1)',rec=n) c
...

can be small? I'd think the overhead of 4,000 read statements
instead of one would be bad enough, even without the issue
of formatted I/O. Heck, I've seen problems with the performance
difference between

read(lun,rec=n) x

vs

read(lun,rec=n) (x(i),1=1,1000)

And this isn't a particularly extreme example. Reading 1000
reals in a chuck seems fairly modest for such cases.

Of course, if your base for comparison is formatted instead of
unformatted, things might be different. My rule of thumb (which
I haven't rechecked recently) is that formatted is about an
order of magnitude slower than unformatted. If you've already
taken one order of magnitude hit by going to formatted, I guess
I find it a little more plausible that the extra hit from reading
one byte at a time might not add that much more.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain | experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Clive Page

2004-03-31, 4:33 pm

In message <m1ekraou3s.fsf@macfortran.local>, Richard Maine
<nospam@see.signature> writes
>Of course, if your base for comparison is formatted instead of
>unformatted, things might be different. My rule of thumb (which
>I haven't rechecked recently) is that formatted is about an
>order of magnitude slower than unformatted.


I'd agree with that rule of thumb. My comment about being not "all
that" much slower, was relative to a formatted read, my point being that
the record-length of 1 didn't seem to make things much worse. But I was
only using it to read smallish files. I'm sure a genuine "binary"
access-mode would be much faster.

I'm almost tempted to so some re-tests, but don't have the time at
present, and I'm sure it's very platform and compiler-dependent.

--
Clive Page
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com