For Programmers: Free Programming Magazines  


Home > Archive > Fortran > February 2008 > Reading a file with an unknown amount of data









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Reading a file with an unknown amount of data
deltaquattro

2008-01-08, 7:17 pm

Hi, guys,

long time no see :-) The following program can read a file containing
an unknown number of lines, each line containing an unknown number of
reals:

PROGRAM TEST

IMPLICIT NONE
INTEGER, PARAMETER :: MAX = 100
INTEGER :: I, N, IO
REAL, DIMENSION(MAX) :: X

OPEN (1, FILE = 'DATA')
READ( 1, *, IOSTAT = IO ) ( X(I), I = 1, MAX )

IF (IO < 0) THEN
N = I - 1
ELSE
N = MAX
END IF

PRINT*, N
PRINT*, ( X(I), I = 1, N )

END PROGRAM TEST

Using the implied do loop to read any number of reals per line is
neat, but I don't like the fact that, if the file DATA
is empty (or contains one ore more empty lines), then the number of
lines N is incorrectly detected as -1. Why does this happen? How would
you solve this problem? If we known that each line contains only one
real, we could count the number of lines in advance by using:

N=0
DO
READ( 1, *, IOSTAT = IO )
IF (IO < 0) EXIT
N = N + 1
N = I - 1
END DO

but this does not work if the file contains empty lines.
Thanks,

Best regards,

deltaquattro
dpb

2008-01-08, 7:17 pm

deltaquattro wrote:
> Hi, guys,
>
> long time no see :-) The following program can read a file containing
> an unknown number of lines, each line containing an unknown number of
> reals:
>

....
> READ( 1, *, IOSTAT = IO ) ( X(I), I = 1, MAX )
> IF (IO < 0) THEN
> N = I - 1

....
> Using the implied do loop to read any number of reals per line is
> neat, but I don't like the fact that, if the file DATA
> is empty (or contains one ore more empty lines), then the number of
> lines N is incorrectly detected as -1. Why does this happen? How would
> you solve this problem? ...


Why is because the READ fails for the first value of the implied loop
and so I isn't incremented and you're not testing for that case.

Typically these kinds of problems are solved by reading the file into a
CHARACTER variable via an 'A' format, then parsing the string w/
internal reads.

--
Paul van Delst

2008-01-08, 7:17 pm

deltaquattro wrote:
> Hi, guys,
>
> long time no see :-) The following program can read a file containing
> an unknown number of lines, each line containing an unknown number of
> reals:
>
> PROGRAM TEST
>
> IMPLICIT NONE
> INTEGER, PARAMETER :: MAX = 100
> INTEGER :: I, N, IO
> REAL, DIMENSION(MAX) :: X
>
> OPEN (1, FILE = 'DATA')
> READ( 1, *, IOSTAT = IO ) ( X(I), I = 1, MAX )
>
> IF (IO < 0) THEN
> N = I - 1
> ELSE
> N = MAX
> END IF
>
> PRINT*, N
> PRINT*, ( X(I), I = 1, N )
>
> END PROGRAM TEST
>
> Using the implied do loop to read any number of reals per line is
> neat, but I don't like the fact that, if the file DATA
> is empty (or contains one ore more empty lines), then the number of
> lines N is incorrectly detected as -1. Why does this happen? How would
> you solve this problem? If we known that each line contains only one
> real, we could count the number of lines in advance by using:
>
> N=0
> DO
> READ( 1, *, IOSTAT = IO )
> IF (IO < 0) EXIT
> N = N + 1
> N = I - 1
> END DO
>
> but this does not work if the file contains empty lines.


Read each line into a character buffer and test for zero length. If it is zero, go to the
next line. If the length is not zero, do an internal read from the buffer to your real
variables.

cheers,

paulv
Dick Hendrickson

2008-01-08, 7:17 pm

deltaquattro wrote:
> Hi, guys,
>
> long time no see :-) The following program can read a file containing
> an unknown number of lines, each line containing an unknown number of
> reals:
>
> PROGRAM TEST
>
> IMPLICIT NONE
> INTEGER, PARAMETER :: MAX = 100
> INTEGER :: I, N, IO
> REAL, DIMENSION(MAX) :: X
>
> OPEN (1, FILE = 'DATA')
> READ( 1, *, IOSTAT = IO ) ( X(I), I = 1, MAX )


This isn't a standard conforming way to do things. If an error
condition occurs, then the DO variable, I, becomes undefined.
Your program only works when there are 100 or more values on
the line.

There's no direct solution. The standard imposes essentially no
rules when IOSTAT comes back negative. As others have suggested,
the normal way is to do internal reads.

Dick Hendrickson
>
> IF (IO < 0) THEN
> N = I - 1
> ELSE
> N = MAX
> END IF
>
> PRINT*, N
> PRINT*, ( X(I), I = 1, N )
>
> END PROGRAM TEST
>
> Using the implied do loop to read any number of reals per line is
> neat, but I don't like the fact that, if the file DATA
> is empty (or contains one ore more empty lines), then the number of
> lines N is incorrectly detected as -1. Why does this happen? How would
> you solve this problem? If we known that each line contains only one
> real, we could count the number of lines in advance by using:
>
> N=0
> DO
> READ( 1, *, IOSTAT = IO )
> IF (IO < 0) EXIT
> N = N + 1
> N = I - 1
> END DO
>
> but this does not work if the file contains empty lines.
> Thanks,
>
> Best regards,
>
> deltaquattro

James Giles

2008-01-08, 7:17 pm

Dick Hendrickson wrote:
> deltaquattro wrote:

....
....[color=darkred]
>
> This isn't a standard conforming way to do things. If an error
> condition occurs, then the DO variable, I, becomes undefined.
> Your program only works when there are 100 or more values on
> the line.


It's worse than that. Since the format specification is *, this
will read as many records as needed until it finds something
that can't be converted to the type of X, until some error,
until it finds a slash ('/') on the input, or until it has read 100
values.

In any case, an implied loop isn't necessarily implemented as
a loop. In this case what it tells the compiler is which elements
of X are considered to be items on the I/O list (turns out that it's
all of X in this example). This can be determined before the
I/O statement is even processed. The value of I is not required
to have any relation to "how many values were successfully
read".

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


deltaquattro

2008-01-09, 8:13 am

On 8 Gen, 19:50, "James Giles" <jamesgi...@worldnet.att.net> wrote:
> Dick Hendrickson wrote:
> ...
> ...
>

[..]
>
> In any case, an implied loop isn't necessarily implemented as
> a loop. =A0In this case what it tells the compiler is which elements
> of X are considered to be items on the I/O list (turns out that it's
> all of X in this example). =A0This can be determined before the
> I/O statement is even processed. =A0The value of I is not required
> to have any relation to "how many values were successfully
> read".
>
> --
> J. Giles
>

[..]

Thank you guys. Your answers are enlightening as always. As a matter
of fact, the strange results I got made me suspect that this wasn't a
standard conforming way to do things. I understand that in the general
case I need to read each line in a string and parse it with internal
reads. However, in the simpler case when each line is empty or
contains just one real, will the following work *and* be standard
conforming ? From my tests it seems so.

=2E
=2E
REAL :: A
=2E
=2E

EOF =3D .FALSE.
I =3D 0
DO WHILE(.NOT.(EOF))
READ( 1, *, IOSTAT =3D IO ) A
IF (IOSTAT < 0) THEN
EOF =3D .TRUE.
ELSE
I =3D I + 1
IF (I > MAX) THEN
WRITE(*,*) 'ERROR: NUMBER OF DATA HIGHER THAN SIZE OF ARRAY
X'
ELSE
X(I) =3D A
END IF
END IF
END DO

Thanks again,

Best regards,

deltaquattro
deltaquattro

2008-01-09, 8:13 am

On 8 Gen, 16:32, Paul van Delst <Paul.vanDe...@noaa.gov> wrote:
> deltaquattro wrote:
>
>
>
>
>
>
>
>
>
>
>
> Read each line into a character buffer and test for zero length. If it is =

zero, go to the
> next line. If the length is not zero, do an internal read from the buffer =

to your real
> variables.
>
> cheers,
>
> paulv- Nascondi testo tra virgolette -
>
> - Mostra testo tra virgolette -


Thank you, dpb, Paul and Dick, for the suggestion about the character
buffer. However, I'm not sure how to perform the internal read if
there is an unknown number of reals per line. A solution could be
using the index intrinsic to identify substrings enclosed by blanks,
then read each substrings into a real variable by list-directed input.

Best regards,

deltaquattro

ps: maybe input files where both the number of lines and the number of
data per line is unspecified are uncommon at the very least :) so this
could seem a weird problem to think about. However, it teached me a
lot about the Fortran standard. Thanks again.
David Frank

2008-01-09, 8:13 am

Below demonstrates what your topic requests,
by using my string functions available at link below:
http://home.earthlink.net/~dave_gemini/strings.f90


! -------------------------------
include "strings.f90"
program demo_variable_array_input
use string_functions
implicit none
character(200) :: dat
integer :: num
real,allocatable :: x(:)

open (1,file='test.txt') ! create 3 line
text file
write (1,*) '123 4.5 22.22,33.33 4' ! line = 5 items
write (1,*) '2123 4.5 22.22,33.33 55.5,66' ! line = 6
write (1,*) ' ' ! blank line
write (1,*) '3123 4.5 2,22,33 -11 777 ' ! line = 7
rewind (1) ! total = 18 items
to input

num = 0
do
read (1,'(a)',end=100) dat
num = num + Count_Items(dat)
end do
100 rewind (1)
allocate ( x(num) )
read (1,*) x
write (*,'(a,i0)') 'num items input= ', num ! = 18
end program


Paul van Delst

2008-01-09, 7:20 pm

deltaquattro wrote:
> On 8 Gen, 19:50, "James Giles" <jamesgi...@worldnet.att.net> wrote:
> [..]
> [..]
>
> Thank you guys. Your answers are enlightening as always. As a matter
> of fact, the strange results I got made me suspect that this wasn't a
> standard conforming way to do things. I understand that in the general
> case I need to read each line in a string and parse it with internal
> reads. However, in the simpler case when each line is empty or
> contains just one real, will the following work *and* be standard
> conforming ? From my tests it seems so.
>


The code looks o.k., but how about
dpb

2008-01-09, 10:12 pm

deltaquattro wrote:
....

> Thank you, dpb, Paul and Dick, for the suggestion about the character
> buffer. However, I'm not sure how to perform the internal read if
> there is an unknown number of reals per line. A solution could be
> using the index intrinsic to identify substrings enclosed by blanks,
> then read each substrings into a real variable by list-directed input.


The most general and robust solution would be to count "words" in the
line to determine the number of entries to be read. If I were at the
development machine I'd post a routine that does it, but the local
network is down at the moment :( so it's not readily accessible.

The other way to handle the problem is to include the necessary data in
the file as the first record if you have control over the writing of the
files in the first place, of course.

--

dpb

2008-01-09, 10:12 pm

dpb wrote:
> deltaquattro wrote:
> ...
>
>
> The most general and robust solution would be to count "words" in the
> line to determine the number of entries to be read. ...


Retract "most" for "a general" ... :)

--

Dick Hendrickson

2008-01-09, 10:12 pm

deltaquattro wrote:
> On 8 Gen, 16:32, Paul van Delst <Paul.vanDe...@noaa.gov> wrote:
>
> Thank you, dpb, Paul and Dick, for the suggestion about the character
> buffer. However, I'm not sure how to perform the internal read if
> there is an unknown number of reals per line. A solution could be
> using the index intrinsic to identify substrings enclosed by blanks,
> then read each substrings into a real variable by list-directed input.


It's awkward to do it from a character buffer. David Frank has one
method of doing it. You could also try non-advancing input. It allows
you to read one field at a time from the input line. Something like
DO I = 1,max
READ(1,*,iostat=IO, eor = 10, advance = 'NO') X(I)
if (io <0) then
print *, "bad input data"
stop
endif
enddo
print *, "be careful, there were at least max elements"
read(1,*) !skip to next line
10 N = I -1

That's just my first guess at how you might be able to do it.
Non-advancing input is record oriented. You'll have to do something
different if you want to read one array from more than one input
card.

Dick Hendrickson
>
> Best regards,
>
> deltaquattro
>
> ps: maybe input files where both the number of lines and the number of
> data per line is unspecified are uncommon at the very least :) so this
> could seem a weird problem to think about. However, it teached me a
> lot about the Fortran standard. Thanks again.

James Giles

2008-01-09, 10:12 pm

Paul van Delst wrote:
....
> I = 0
> Read_Loop: DO
> READ( 1, *, IOSTAT = IO ) A
> IF (IOSTAT < 0) THEN
> close(1)
> EXIT Read_Loop
> END IF
> I = I + 1
> IF (I > MAX) THEN
> WRITE(*,*) 'ERROR: NUMBER OF DATA HIGHER THAN SIZE OF ARRAY'
> STOP
> END IF
> X(I) = A
> END DO Read_Loop


Your code has a bug. The IOSTAT variable set by the READ
is IO, but you're IF statement is testing something called IOSTAT.

> The differences are mostly for subjective reasons:
>
> - I don't like "DO WHILE"


This really *is* subjective. I really strongly dislike EXIT.
Especially in cases like this where it's buried in yet another
nested block.

> - I don't like to test for .NOT. something. I.e. I'd rather test for
> a positive than negative result.


Fair enough. I agree we need an UNTIL statement.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


robin

2008-02-17, 8:25 am

"deltaquattro" <deltaquattro@gmail.com> wrote in message
news:df61f3d3-8ddf-46bd-a928-4668279b463e@21g2000hsj.googlegroups.com...
> If we known that each line contains only one
> real, we could count the number of lines in advance by using:
>
> N=0
> DO
> READ( 1, *, IOSTAT = IO )
> IF (IO < 0) EXIT
> N = N + 1
> N = I - 1
> END DO


If you know that each line is blank or holds one number,
you can write:

program r
character (len=100) :: line
real :: x(100)
integer :: i, n, io

open (10, file = 'DATA')
n = 0
do
read (10, '(A)', iostat = io) line
if (io < 0) exit
if (len_trim(line) == 0) cycle
n = n + 1
read (line, *) x(n)
end do
print *, n
print *, (x(i),i=1,n)
end program r


robin

2008-02-18, 7:22 pm

"James Giles" <jamesgiles@worldnet.att.net> wrote in message
news:Lyahj.120698$MJ6.69601@bgtnsc05-news.ops.worldnet.att.net...
> Paul van Delst wrote:
> ...
>
> Your code has a bug. The IOSTAT variable set by the READ
> is IO, but you're IF statement is testing something called IOSTAT.


It's just a copying error.

>
> This really *is* subjective. I really strongly dislike EXIT.
> Especially in cases like this where it's buried in yet another
> nested block.


There's only one loop.

What would you use? A GO TO?


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com