Home > Archive > Fortran > January 2006 > Read an ascii file into a string.
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Read an ascii file into a string.
|
|
| titusjan@gmail.com 2006-01-18, 7:03 pm |
| Dear all.
I'm trying to read a complete ascii file into a single character string
so I can write it into an HDF4 file dataset.
I've found a way to read the length of the file so I can allocate my
array in advance. I understand that with unformatted output every
record coresponds, therefore I'm using unformatted output.
I'm reading an unformatted string but I get an error that I read past
the end of the file.
This is my code:
subroutine ingest_config_file(file_name, config_str )
implicit none
character (len=*), intent(in) :: file_name
character , intent(inout) :: config_str(:)
integer :: ios
character :: chr
character (3) :: temp
integer :: i, fs
Open (unit = 777, File = file_name, action = "read", iostat =
ios, &
form="UNFORMATTED")
If (ios /= 0) Then
Write (Unit=*, fmt='(A,I4,A,A)') &
'Unable to open file (ios= ', ios, &
') file = ', file_name
Stop 1
Endif
fs = size(config_str)
print *, "file size: ", fs
do i = 1, fs
read (777) config_str(i)
print *,"chr: ", config_str(i)
enddo
Close (777)
endsubroutine ingest_config_file
This is the output:
file size: 36
forrtl: severe (24): end-of-file during read, unit 777, file
/home/pepijn/programming/pruts/hdf/toet
Image PC Routine Line Source
test_sds 080E3BB4 Unknown Unknown Unknown
test_sds 080E36AC Unknown Unknown Unknown
test_sds 080C1C65 Unknown Unknown Unknown
test_sds 080A0AE0 Unknown Unknown Unknown
test_sds 080A0F83 Unknown Unknown Unknown
test_sds 080ADACD Unknown Unknown Unknown
test_sds 08054CD4 test_sds_.ingest_ 274
test_sds.f90
test_sds 08052295 MAIN__ 38
test_sds.f90
test_sds 0804ABE8 Unknown Unknown Unknown
Unknown B7BE0ED0 Unknown Unknown Unknown
test_sds 0804AAA1 Unknown Unknown Unknown
So, summarizing: I don't know how to do it with formated input because
lines differ in lenght and I get an error with unformatted input as
shown above.
Any help would be appreciated.
Regards, Pepijn.
| |
| Richard E Maine 2006-01-18, 7:03 pm |
| <titusjan@gmail.com> wrote:
> I'm trying to read a complete ascii file into a single character string
> so I can write it into an HDF4 file dataset.
>
> I've found a way to read the length of the file so I can allocate my
> array in advance.
> I understand that with unformatted output every
> record coresponds, therefore I'm using unformatted output.
I can't parse that sentence. Corresponds to what? I guess it doesn't
matter, though.
> I'm reading an unformatted string
There is no such thing as an "unformatted string", but from context, you
mean that you are reading the file as an unformatted sequential file.
You can't do that. Period. A text file is not an unformatted sequential
file, and you can't read it as one. (Well, some operating systems are
smart enough to handle the trick, but not any systems that are in very
much use today). An unformatted sequential file has extra information
buried in it to defining record boundaries. Since a text file won't have
that information, things will get (i.e. they won't work) when
you open the file as unformatted sequential and the expected information
isn't found.
You are probably thinking of unformatted as being like what is called
stream in f2003 - just the raw data, with no "extra" stuff. It isn't.
There are non-standard ways to do the equivalent of f2003 stream in f95
compilers (and some of them even implement the f2003 syntax). That could
be made to work for you, but I doubt it would be my first
recommendation.
There are several possibilities for what you are trying to do. But
first, I don't understand one aspect. You talk about reading a whole
file into a string. Is this whole file all on one line (record)? If not,
you need to define what you mean to do about the record boundaries. A
string doesn't have anything comparable to records. Perhaps you mean to
just effectively concatenate all the lines together. Or perhaps you mean
to do something like put special characters in the string to denote the
line ends. Whichever of these you do, you'll need special work.
I need to go in about 3 minutes, so no time to write out code samples,
but I suspect the best option for you would be to use non-advancing
reads. If that's not enough hint, probably someone else can give a bit
more (or I might be able to later).
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| Pepijn Kenter 2006-01-18, 10:00 pm |
| Richard E Maine wrote:
> <titusjan@gmail.com> wrote:
>
>
>
>
>
>
> I can't parse that sentence. Corresponds to what? I guess it doesn't
> matter, though.
>
Sorry, after a long time reading fortran documentation I'm a bit groggy
and not very precise. What I meant was that in formatted I/O, every READ
statement reads one or more records and that I understand that every
records consists of one line.
I can't seem to read an ascii file one character at the time.
> You are probably thinking of unformatted as being like what is called
> stream in f2003 - just the raw data, with no "extra" stuff. It isn't.
I'm not familiar with F2003 and streams but indeed I imagined an
unformatted file to be just the data.
> There are non-standard ways to do the equivalent of f2003 stream in f95
> compilers (and some of them even implement the f2003 syntax). That could
> be made to work for you, but I doubt it would be my first
> recommendation.
>
I agree. That's not the way to go.
> There are several possibilities for what you are trying to do. But
> first, I don't understand one aspect. You talk about reading a whole
> file into a string. Is this whole file all on one line (record)? If not,
> you need to define what you mean to do about the record boundaries. A
> string doesn't have anything comparable to records. Perhaps you mean to
> just effectively concatenate all the lines together. Or perhaps you mean
> to do something like put special characters in the string to denote the
> line ends. Whichever of these you do, you'll need special work.
>
The ascii file consists of more than one line. I want to read this into
a character array that also contains the end of line characters (\n). I
intent to write this character array to the HDF file, including the
\n's. So there's no need to replace the /013 characters with something else.
While reading your reply I'm getting the feeling that my mental model of
fortran I/O is wrong. As you might have noticed, I'm more used to C.
> I need to go in about 3 minutes, so no time to write out code samples,
> but I suspect the best option for you would be to use non-advancing
> reads. If that's not enough hint, probably someone else can give a bit
> more (or I might be able to later).
>
I don't see how non-advancing reads will help me here. But thanks for
your quick reply.
Pepijn Kenter.
| |
| glen herrmannsfeldt 2006-01-18, 10:00 pm |
| Pepijn Kenter <titusjan2@gmail.com> wrote:
> Sorry, after a long time reading fortran documentation I'm a bit groggy
> and not very precise. What I meant was that in formatted I/O, every READ
> statement reads one or more records and that I understand that every
> records consists of one line.
In unformatted I/O it is called a record, in formatted either a
line or record.
Unformatted I/O allows one to read less data than a record
actually contains, but never start in the middle of a record.
READ(1)
will skip one record, independent of its length. Some systems
have file systems that keep a record structure, others (unix, DOS,
and Windows) require the Fortran library to supply the record
structure. It is usually done by putting a record length at the
beginning of each record.
I would probably read your file with a loop reading A format,
then concatentating the results together with newline characters
in between and at the end. Then it is even portable to systems
that use other line terminators or a record oriented file system.
-- glen
| |
| Pepijn Kenter 2006-01-19, 4:06 am |
| >
> In unformatted I/O it is called a record, in formatted either a
> line or record.
>
> Unformatted I/O allows one to read less data than a record
> actually contains, but never start in the middle of a record.
>
> READ(1)
>
> will skip one record, independent of its length. Some systems
> have file systems that keep a record structure, others (unix, DOS,
> and Windows) require the Fortran library to supply the record
> structure. It is usually done by putting a record length at the
> beginning of each record.
>
Ah, I see.
>
> I would probably read your file with a loop reading A format,
> then concatentating the results together with newline characters
> in between and at the end.
But then I would have to know the length of a line before I read it into
a string. Therefore I'm trying to read it one character at the time.
> Then it is even portable to systems
> that use other line terminators or a record oriented file system.
>
It only has to run under Linux/unix.
Pepijn.
| |
| Arjen Markus 2006-01-19, 4:06 am |
| Have look at http://flibs.sf.net - the library I describe there has a
module for dealing with arbitrary-length strings (not the n-th
variation on the theme ;) - I just wanted to be able to store such
strings). It has a routine to read a single complete line using
non-advancing reads.
Regards,
Arjen
| |
| meek@skyway.usask.ca 2006-01-19, 8:02 am |
| In a previous article, titusjan@gmail.com wrote:
>Dear all.
>
>I'm trying to read a complete ascii file into a single character string
>so I can write it into an HDF4 file dataset.
If it's just a copy of the original file, why not just
append files to make the hdf4 (disclaimer - I have no idea
how an hdf4 is formatted).
If you have to modify something, unformatted binary ...
sometimes called stream, is the best way (if your compiler
has such capability)
Chris
| |
| Rich Townsend 2006-01-19, 7:05 pm |
| Arjen Markus wrote:
> Have look at http://flibs.sf.net - the library I describe there has a
> module for dealing with arbitrary-length strings (not the n-th
> variation on the theme ;) - I just wanted to be able to store such
> strings). It has a routine to read a single complete line using
> non-advancing reads.
>
> Regards,
>
> Arjen
>
Likewise, my implementation of the Part 2 of the Fortran standard, in the form
of the ISO_VARYING_STRING module, can read an abitrary-length line and return it
in a VARYING_STRING datatype. Not very efficiently, I might add, but at least
it's (a) robust and (b) well specificed.
Get it from:
http://www.star.ucl.ac.uk/~rhdt/download/#iso
Note: the little compatibility table on the above page is way out of date, so
don't let it bother you!
cheers,
Rich
| |
| Rich Townsend 2006-01-19, 7:05 pm |
| Rich Townsend wrote:
> Arjen Markus wrote:
>
>
> Likewise, my implementation of the Part 2 of the Fortran standard, in
> the form of the ISO_VARYING_STRING module, can read an abitrary-length
> line and return it in a VARYING_STRING datatype. Not very efficiently, I
> might add, but at least it's (a) robust and (b) well specificed.
>
> Get it from:
>
> http://www.star.ucl.ac.uk/~rhdt/download/#iso
>
> Note: the little compatibility table on the above page is way out of
> date, so don't let it bother you!
>
> cheers,
>
> Rich
Futher to my post above, here's an example program to read the file into a string:
---CUT HERE---
program read_ascii
implicit none
use ISO_VARYING_STRING
integer, parameter :: unit = 69
type(VARYING_STRING) :: string
type(VARYING_STRING) :: line
integer :: iostat
OPEN(UNIT=unit, FILE='ascii_file.txt', STATUS='OLD')
do
call GET(unit, line, IOSTAT=iostat)
if(iostat > 0) stop 'Read error'
if(iostat < 0) exit
string = string//line
enddo
print *,CHAR(string)
end program
---CUT HERE---
NOTE: This will discard any end-of-line characters (e.g., CR and/or LF). If
these must appear in the string, then you may need to reinsert them in the
string//line concatenation.
cheers,
Rich
| |
| Richard E Maine 2006-01-19, 7:05 pm |
| Pepijn Kenter <titusjan2@gmail.com> wrote:
> I can't seem to read an ascii file one character at the time.
Non-advancing will do that. See below.
> The ascii file consists of more than one line. I want to read this into
> a character array that also contains the end of line characters (\n). I
> intent to write this character array to the HDF file, including the
> \n's. So there's no need to replace the /013 characters with something else.
Ok. I guessed that was likely. Note that, as far as Fortran is
concerned, the file doesn't have control characters at all. It has
records. How the records are implemented is up to the processor, but
even if it involves cr and/or lf characters, you won't see those in your
Fortran program. You'll just see a record. Thus you will have to
"translate" the end-of-record into the appropriate marker character that
you want in the file.
> While reading your reply I'm getting the feeling that my mental model of
> fortran I/O is wrong. As you might have noticed, I'm more used to C.
The f2003 stream I/O is modeled very closely after C files. In fact, it
is modeled so closely that some C programers who don't actually know the
C standard very well are surprised by some of the details (for example,
the way that writing in the "middle" of a file works for formatted vs
unformatted files).
As a slightly related aside, note that C doesn't literally give you the
data from formatted files either. There also, the actual record
structure is "translated" if needed into \n characters returned to the
program. But I digress.
F2003 stream would work for you if your compiler supported it. But any
f90 compiler is sure to support non-advancing.
> I don't see how non-advancing reads will help me here. But thanks for
> your quick reply.
Sample below. (Tested even, though only briefly). This program reads its
own source file. The iostat_eor and iostat_eof values are
compiler-dependent (but these are very common ones). I haven't bothered
with dynamically determining the file size. (Another f2003 feature is
handy for that).
Oh, and as another aside, the_file here (like your config_str) is *NOT*
a Fortran string. It is an array of Fortran strings, each of which is 1
character long. There are places where the difference matters. The array
of strings is fine for current purposes; in fact probably best, as it is
easier to allocate dynamically (that distinction dissapears in f2003).
The array of characters is much more like the C model, but if you take
to calling it a string in Fortran, you'll eventually get mislead (if
nothing else, people will give you suggestions that won't work because
they think you mean normal Fortran strings).
program readme
implicit none
integer, parameter :: iostat_eof = -1, iostat_eor = -2
integer :: n_chars, iostat, max_len
character*1, allocatable :: the_file(:)
character :: c*1
max_len = 1024
allocate(the_file(max_len))
open(11,file='readme.f90',form='formatted',status='old')
n_chars = 0
line_loop: do
char_loop: do
read(11,'(a1)',advance='no',iostat=iosta
t) c
if (iostat==iostat_eof) exit line_loop
if (iostat==iostat_eor) exit char_loop
if (iostat/=0) then
write (*,*) 'Oops. iostat = ', iostat
stop
end if
if (n_chars+2>max_len) then
write (*,*) 'Oops. Too long.'
stop
end if
n_chars = n_chars + 1
the_file(n_chars) = c
end do char_loop
n_chars = n_chars + 1
the_file(n_chars) = achar(10)
write (*,*) 'Line done at n_chars = ', n_chars
end do line_loop
write (*,*) 'File done at n_chars = ', n_chars
write (*,*) the_file(1:n_chars)
stop
end program readme
By the way. Using size= should work as an alternative to reading
character-at-a-time. That would allow you to read line-at-a-time. But
one sample seems enough.
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| Kevin G. Rhoads 2006-01-19, 7:05 pm |
| > > I would probably read your file with a loop reading A format,
>
>But then I would have to know the length of a line before I read it into
>a string. Therefore I'm trying to read it one character at the time.
If you have a definite upper limit for line length, use a string of that
length and after reading determine the end of the string. Old style this
was non-standard, usually called LenTrim or Len_Trim, but there are standard
ways of doing that now.
| |
| Richard E Maine 2006-01-19, 7:05 pm |
| Kevin G. Rhoads <kgrhoads@alum.mit.edu> wrote:
> If you have a definite upper limit for line length, use a string of that
> length and after reading determine the end of the string. Old style this
> was non-standard, usually called LenTrim or Len_Trim, but there are standard
> ways of doing that now.
Please note that len_trim is *NOT* necessarily the same thing as the
length of the string. It is just the length of the non-blank portion.
For some apps (likely including this one from the sounds of it), this
might be adequate; but for other apps, it might not be.
For current purposes, this is probably just a wording quibble. But if I
let such incorrect wordings go, someone else will probably try to use
len_trim when it isn't the right answer.
--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
| |
| glen herrmannsfeldt 2006-01-20, 3:58 am |
| Richard E Maine wrote:
(snip)
> As a slightly related aside, note that C doesn't literally give you the
> data from formatted files either. There also, the actual record
> structure is "translated" if needed into \n characters returned to the
> program. But I digress.
The C standard was carefully written to allow that.
Some of the interesting cases involve fs () and ftell() for s ing
to a specified position in the file, and finding the current position,
respectively. For text files (the default) you can't rely on doing
arithmetic on the offset values.
-- glen
| |
| Arjen Markus 2006-01-20, 3:58 am |
| I see you use the approach of storing the strings in separate
characters.
Ever considered doing it in larger chunks? My module uses chunks of
40 characters and when it needs to consider the string as a whole, it
glues them together in a temporary string of sufficient length.
Something
like:
subroutine write_string( lun, string, length )
integer :: lun
character(len=*), dimension(:) :: string
character(len=length) :: tmp_string
! Copy the pieces into the temporary string
....
end subroutine
The length argument is either stored with the varying-length string
or computed from similar information.
This way I can use the standard character routines (saves a lot
of code). I have not measured the performance, but my main goal
was to minimize the time I had to spend writing it.
Regards,
Arjen
| |
| Kevin G. Rhoads 2006-01-20, 7:09 pm |
| >For current purposes, this is probably just a wording quibble. But if I
>let such incorrect wordings go, someone else will probably try to use
>len_trim when it isn't the right answer.
All too true.
On another note, I often enjoy your quibbles Richard. Yet another way to learn.
| |
| David Ham 2006-01-20, 7:09 pm |
| Kevin G. Rhoads wrote:
>
>
> All too true.
>
> On another note, I often enjoy your quibbles Richard. Yet another way to learn.
Seconded. Side remarks from Richard are probably second only to M,R+C as
a tool for learning the precise meaning of the Fortran standard.
David
| |
| James Van Buskirk 2006-01-20, 7:09 pm |
| "Arjen Markus" <arjen.markus@wldelft.nl> wrote in message
news:1137750411.799567.308950@g44g2000cwa.googlegroups.com...
> I see you use the approach of storing the strings in separate
> characters.
> Ever considered doing it in larger chunks?
Yes, I like the idea of larger chunks. Here's an example that
works with CVF 6.6C3:
C:\Program Files\Microsoft Visual Studio\James\clf\file_string>df /nologo
file_s
tring.f90
file_string.f90
C:\Program Files\Microsoft Visual Studio\James\clf\file_string>file_string
program file_string
use DFPORT
implicit none
integer file_size
character(*), parameter :: filename = 'file_string.f90'
integer result
integer statb(12)
integer iunit
iunit = 10
open(iunit,file=filename)
result = fstat(iunit,statb)
if(result /= 0) then
write(*,'(a,z8.8)') 'Error status returned by FSTAT = ',result
stop
end if
file_size = statb(8)
close(iunit)
call realsub(filename,file_size)
end program file_string
subroutine realsub(filename,file_size)
character(*), intent(in) :: filename
integer, intent(in) :: file_size
character(file_size) file_data
integer line_start, line_end
character, parameter :: CR = achar(13)
open(10,file=filename,form='BINARY')
read(10) file_data
line_start = 1
do
line_end = index(file_data(line_start:),CR)
if(line_end == 0) exit
line_end = line_start+line_end-2
write(*,'(a)') file_data(line_start:line_end)
line_start = line_end+3
end do
end subroutine realsub
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
|
|
|