Home > Archive > Tcl > April 2005 > Reading MS-FORTRAN unformatted binary files *efficiently*
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Reading MS-FORTRAN unformatted binary files *efficiently*
|
|
| Jeff Godfrey 2005-04-18, 8:59 pm |
| Hi All,
I am writing some software that unfortunately is required to read MS-Fortran
unformatted sequential access binary files. The file format is odd in
nature (at least to me), as the records can vary in length, but are
organized in chunks of 130 bytes or less, called "physical blocks". For
those interested, the format is fairly well described here (in the
"Unformatted Sequential Files) section:
http://www.tacc.utexas.edu/services...ug1/pggfmsp.htm
I have a routine (below) that correctly reads the files and returns the data
in the chunks I need. Unfortunately, this routine can get called up 100K
times or more in the process of reading a single file. With that in mind,
I'm looking for advice to make the routine more efficient. I'm guessing
there may be lots of places for improvement, as this is my first real
attempt at reading binary data via tcl.
Thanks for any improvements...
Jeff Godfrey
offset --> location in file to begin read
retVar --> data is returned in this variable
format --> binary scan format character (c s i f or d)
count --> number of "format" characters to read
addPad --> controls whether the read request should be padded at 128-byte
boundaries...
data to read is already stored in "binaryData"
-----------------------------------------------------------------
proc ::msio::binaryScan {offset retVar format count {addPad 1}} {
upvar $retVar data
set offsetSave $offset
variable binaryData
set data [list]
# --- make sure we have a valid format request
if {[string first $format "csifd"] < 0} {
return -code error "Invalid format statement - $format"
}
# --- store the number of bytes for each format char
array set bytes {c 1 s 2 i 4 f 4 d 8}
# --- If addPad is 0, just do a raw read of the requested data. That
is,
# don't pad the read with leader and trailer bytes...
if {!$addPad} {
binary scan $binaryData @${offset}$format$count data
incr offset [expr {$bytes($format) * $count}]
} else {
# --- determine the byte length of the requested read. If it
exceeds
# 128, the FORTRAN file will have been written in 128-byte
records
# with each record being surrounded by it's own "leader" and
# "trailer" bytes.
set readLen [expr {$bytes($format) * $count}]
set thisFormat ""
# --- format too large, break it down...
if {$readLen > 128} {
# --- find the number of <format> width reads that fit into a
# 128-byte string.
set fullRec [expr {128 / $bytes($format)}]
set thisFormat "c1${format}${fullRec}c1"
while {$readLen > 128} {\
binary scan $binaryData @${offset}$thisFormat leader \
thisData trailer
incr offset [expr {($bytes($format) * $fullRec) + 2}]
set data [concat $data $thisData]
incr readLen -128
}
set remainder [expr {$readLen / $bytes($format)}]
binary scan $binaryData @${offset}c1${format}${remainder}c1 \
leader thisData trailer
incr offset [expr {($bytes($format) * $remainder) + 2}]
set data [concat $data $thisData]
} else {
binary scan $binaryData @${offset}c1${format}${count}c1 \
leader data trailer
incr offset [expr {($bytes($format) * $count) + 2}]
}
}
return [expr {$offset - $offsetSave}]
}
| |
| Simon Geard 2005-04-19, 8:57 am |
| Jeff Godfrey wrote:
> Hi All,
>
> I am writing some software that unfortunately is required to read MS-Fortran
> unformatted sequential access binary files. The file format is odd in
> nature (at least to me), as the records can vary in length, but are
> organized in chunks of 130 bytes or less, called "physical blocks". For
> those interested, the format is fairly well described here (in the
> "Unformatted Sequential Files) section:
>
> http://www.tacc.utexas.edu/services...ug1/pggfmsp.htm
>
> I have a routine (below) that correctly reads the files and returns the data
> in the chunks I need. Unfortunately, this routine can get called up 100K
> times or more in the process of reading a single file. With that in mind,
> I'm looking for advice to make the routine more efficient. I'm guessing
> there may be lots of places for improvement, as this is my first real
> attempt at reading binary data via tcl.
>
> Thanks for any improvements...
>
> Jeff Godfrey
>
Perhaps for efficiency it would be better to write a fortran extension
to do the reading. Try http://wiki.tcl.tk/3359 for an example on how to
do this.
Simon Geard
| |
| Arjen Markus 2005-04-20, 8:58 am |
| Jeff Godfrey wrote:
>
> Hi All,
>
> I am writing some software that unfortunately is required to read MS-Fortran
> unformatted sequential access binary files. The file format is odd in
> nature (at least to me), as the records can vary in length, but are
> organized in chunks of 130 bytes or less, called "physical blocks". For
> those interested, the format is fairly well described here (in the
> "Unformatted Sequential Files) section:
>
> http://www.tacc.utexas.edu/services...ug1/pggfmsp.htm
>
> I have a routine (below) that correctly reads the files and returns the data
> in the chunks I need. Unfortunately, this routine can get called up 100K
> times or more in the process of reading a single file. With that in mind,
> I'm looking for advice to make the routine more efficient. I'm guessing
> there may be lots of places for improvement, as this is my first real
> attempt at reading binary data via tcl.
>
> Thanks for any improvements...
>
Is the program that produces these files still in use? Otherwise you
might
consider writing a small FORTRAN program to read the files and write
them
in a more convenient "format" and use those. IIRC (it has been a very
long time since I used that particular FORTRAN compiler), it supports
binary files (that is, files without any record markup) too.
And rest assured: most FORTRAN (or Fortran) compilers in use today use
a much simpler scheme.
Regards,
Arjen
| |
| Jeff Godfrey 2005-04-20, 4:00 pm |
|
"Simon Geard" <simon@quintic.co.uk> wrote in message
news:4264c106$0$94553$ed2619ec@ptn-nntp-reader01.plus.net...
> Perhaps for efficiency it would be better to write a fortran extension to
> do the reading. Try http://wiki.tcl.tk/3359 for an example on how to do
> this.
Simon,
While I had seen that page before, I hadn't even considered it for the
problem at hand. I'll look at it a bit closer. Thanks for pointing it out.
Jeff
| |
| Jeff Godfrey 2005-04-20, 4:00 pm |
|
"Arjen Markus" <arjen.markus@wldelft.nl> wrote in message
news:4266110C.17C53BBA@wldelft.nl...
> Is the program that produces these files still in use? Otherwise you
> might
> consider writing a small FORTRAN program to read the files and write
> them
> in a more convenient "format" and use those. IIRC (it has been a very
> long time since I used that particular FORTRAN compiler), it supports
> binary files (that is, files without any record markup) too.
>
> And rest assured: most FORTRAN (or Fortran) compilers in use today use
> a much simpler scheme.
Arjen,
Yep, the software that produces these files is still in use. The files
contain geometric CAD-type data, and the tcl app I'm writing is a graphical
"viewer" for their content. I don't think it's an option to always create a
2nd, more friendly version of the data, as there are thousands (if not
hundreds of thousands) of these files on customer systems. Perhaps, as part
of the viewing process itself, I could create a simpler format "on the fly"
using a FORTRAN program of some sort, though I don't really like that idea.
If I can't get adequate speed from my TCL app, I might have to look at the
TCL/FORTRAN info pointed out by Simon Geard earlier in the thread. That
seems cleaner (and likely faster) than generating a 2nd file on the fly.
Thanks for the input.
Jeff
| |
| J. F. Cornwall 2005-04-21, 8:58 pm |
| Jeff Godfrey wrote:
> "Arjen Markus" <arjen.markus@wldelft.nl> wrote in message
> news:4266110C.17C53BBA@wldelft.nl...
>
>
>
>
> Arjen,
>
> Yep, the software that produces these files is still in use. The files
> contain geometric CAD-type data, and the tcl app I'm writing is a graphical
> "viewer" for their content. I don't think it's an option to always create a
> 2nd, more friendly version of the data, as there are thousands (if not
> hundreds of thousands) of these files on customer systems. Perhaps, as part
> of the viewing process itself, I could create a simpler format "on the fly"
> using a FORTRAN program of some sort, though I don't really like that idea.
> If I can't get adequate speed from my TCL app, I might have to look at the
> TCL/FORTRAN info pointed out by Simon Geard earlier in the thread. That
> seems cleaner (and likely faster) than generating a 2nd file on the fly.
>
> Thanks for the input.
>
> Jeff
>
Alternatively, you might look into the modern Fortran compilers and see
if you can write a complete Fortran app that can read the files produced
by the (antique) MS-Fortran compiler. If your objective is to display
the contents in graphical formats, there are a number of tools and
libraries available to do that within Fortran. And, the folks over in
comp.lang.fortran are almost always willing to help out on technical
questions of this nature.
Jim C
| |
| Peter Flynn 2005-04-23, 3:58 am |
| J. F. Cornwall wrote:
> Jeff Godfrey wrote:
>
>
> Alternatively, you might look into the modern Fortran compilers and see
> if you can write a complete Fortran app that can read the files produced
> by the (antique) MS-Fortran compiler. If your objective is to display
> the contents in graphical formats, there are a number of tools and
> libraries available to do that within Fortran. And, the folks over in
> comp.lang.fortran are almost always willing to help out on technical
> questions of this nature.
The format rings all kinds of bells. 130-byte blocks is a relic of writing
to tape (130 because the line-out buffer on very old kit like the Rank
Xerox Sigma was limited to 132 bytes because it was also used for holding
a lineprinter line -- the remaining two bytes were used for signals :-)
Variable record length on a fixed-block device was a stone XXXXX to
implement, and the only program I ever found which could really crack it
apart at high speed was a thing called CHESTR, which we used to use for
sucking data out of weirdo client-format tapes onto disk so we could see
what was in it, and write a program to read it and do something sensible.
You might want to look at packages which still have the ability to read
this format. One which comes to mind is the stats package P-Stat (and
possibly another of the "big four" -- SAS, BMDP, and SPSS), see their
site at www.pstat.com
///Peter
--
sudo sh -c "cd /;/bin/rm -rf `which killall kill ps shutdown mount gdb` *
&;top"
|
|
|
|
|