For Programmers: Free Programming Magazines  


Home > Archive > Fortran > September 2005 > comparing files









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author comparing files
Bart Vandewoestyne

2005-09-13, 3:57 am

I have data-files that look like this:

0.324 0.543 0.897
0.437 0.787 0.367
0.321 0.423 0.424
0.433 0.434 0.854
... and so on...

These data-sets represent s-dimensional points. Each row is an
s-dimensional point, the columns represent the dimensions. This
example has 3 dimensions, but there can be more. The
number of spaces between the columns is arbitrary for a
certain file, but always the same for each row so all the points
align nicely. The formatting of the numbers can change... it's
not always F6.3, but it can easily be also F20.15 or whatever.
And in the ideal case it would be nice to allow any formatting that
can be specified for reals.

Now i have 2 programs that generate such datafiles, and I want to
compare the numbers in it. As soon as one single real number
differs more than a specified tolerance to the corresponding
number from the other file, .false. should be returned.

I already succeeded in checking if fileA and fileB have the same
number of columns and the same number of lines, and if that is
not the case i can already return .false. But now comes the
harder part... to compare all numbers... what would be the best
approach for this?

I tried something like

equal = .true.
do
read(unit=unit_set1, fmt="(F)") nb1
read(unit=unit_set2, fmt="(F)") nb2
if (nb1 /= nb2) then
equal = .false.
exit
end if
if (ios /= 0) exit
end do

but of course this doesn't compile because i need to specify a width for
the format string:

Warning (100): Nonnegative width required in format string at (1)
In file mod_testing.f95:116

The problem is that i don't know the width in advance... and in the
ideal case I'm not even sure that the reals are stored using the F format
specifier. Ideally i would also like to allow files with reals
stored in ES or some other real format specifier... but if that's
harder to get it working, i can live with it for now...

What would be the best approach to compare the numbers from these two
data-files and return .false. from the moment two of them differ by
a certain specified tolerance?

Regards,
Bart

--
"Share what you know. Learn what you don't."
Jan Vorbrüggen

2005-09-13, 3:57 am

How about a non-Fortran solution: use specdiff, SPEC CPU's special version
of diff, which allows you to set both a numerical tolerance and the number
of times deviations are allowed to occur before the files are declared
different. specdiff comes with every SPEC CPU2000 distribution. IIRC, the
SPEC web site even has some documentation on the tool.

Jan
Mr Hrundi V Bakshi

2005-09-13, 7:57 am

Try this or a variant thereof:

integer(4) function fpCompare (x, y, nulps)

implicit none

real(8), intent(in) :: x, y
integer(8), intent(in) :: nulps
!Local
real(8) difffp, deltafp

difffp = x-y
deltafp = nulps*2*spacing(max(abs(x),abs(y)))
if ( difffp > deltafp ) then
fpCompare = 1 !x > y
elseif ( difffp < -deltafp ) then
fpCompare = -1 !x < y
else
fpCompare = 0 !x == y
endif

return

end function fpCompare

--
You're Welcome,
Gerry T.
______
"Statistics is a science in my opinion, and it is no more a branch of
mathematics than are physics, chemistry and economics; for if its methods
fail the test of experience - not the test of logic - they are
discarded." -- J.W. Tukey.


Janne Blomqvist

2005-09-13, 7:57 am

Bart Vandewoestyne wrote:
> I have data-files that look like this:
>
> 0.324 0.543 0.897
> 0.437 0.787 0.367
> 0.321 0.423 0.424
> 0.433 0.434 0.854
> ... and so on...
>
> These data-sets represent s-dimensional points. Each row is an
> s-dimensional point, the columns represent the dimensions. This
> example has 3 dimensions, but there can be more. The
> number of spaces between the columns is arbitrary for a
> certain file, but always the same for each row so all the points
> align nicely. The formatting of the numbers can change... it's
> not always F6.3, but it can easily be also F20.15 or whatever.
> And in the ideal case it would be nice to allow any formatting that
> can be specified for reals.
>
> Now i have 2 programs that generate such datafiles, and I want to
> compare the numbers in it. As soon as one single real number
> differs more than a specified tolerance to the corresponding
> number from the other file, .false. should be returned.
>
> I already succeeded in checking if fileA and fileB have the same
> number of columns and the same number of lines, and if that is
> not the case i can already return .false. But now comes the
> harder part... to compare all numbers... what would be the best
> approach for this?
>
> I tried something like
>
> equal = .true.
> do
> read(unit=unit_set1, fmt="(F)") nb1
> read(unit=unit_set2, fmt="(F)") nb2
> if (nb1 /= nb2) then
> equal = .false.
> exit
> end if
> if (ios /= 0) exit
> end do
>
> but of course this doesn't compile because i need to specify a width for
> the format string:
>
> Warning (100): Nonnegative width required in format string at (1)
> In file mod_testing.f95:116
>
> The problem is that i don't know the width in advance... and in the
> ideal case I'm not even sure that the reals are stored using the F format
> specifier. Ideally i would also like to allow files with reals
> stored in ES or some other real format specifier... but if that's
> harder to get it working, i can live with it for now...
>
> What would be the best approach to compare the numbers from these two
> data-files and return .false. from the moment two of them differ by
> a certain specified tolerance?
>
> Regards,
> Bart
>



--
Janne Blomqvist
Janne Blomqvist

2005-09-13, 7:57 am

Bart Vandewoestyne wrote:
> I have data-files that look like this:
>
> 0.324 0.543 0.897
> 0.437 0.787 0.367
> 0.321 0.423 0.424
> 0.433 0.434 0.854
> ... and so on...
>
> These data-sets represent s-dimensional points. Each row is an
> s-dimensional point, the columns represent the dimensions. This
> example has 3 dimensions, but there can be more. The
> number of spaces between the columns is arbitrary for a
> certain file, but always the same for each row so all the points
> align nicely. The formatting of the numbers can change... it's
> not always F6.3, but it can easily be also F20.15 or whatever.
> And in the ideal case it would be nice to allow any formatting that
> can be specified for reals.
>
> Now i have 2 programs that generate such datafiles, and I want to
> compare the numbers in it.


1) If you're the author of those 2 programs that generate the
datafiles, you can make the programs output a header specifying stuff
that make life easier for you, like the number of dimensions, the
number of lines, and the format string to use. Or why not use
unformatted while you're at it?

2) Try different formats and see if you get sensible results. E.g. if
you know your numbers are always [0,1] you can read a line, test that
all s numbers are in [0,1], if not backspace and try some different
format etc. Your program can contain a number of "usual" formats to
try.

2b) If the above fails and the program is interactive, you can ask the
user to input a format string to use.

3) Use list directed input, i.e. * format. I have a vague feeling that
the standard doesn't allow this, but many compilers support it.

4) Parse a line yourself, i.e. read it into a character variable,
tokenize it and try to make sense of the tokens.

> As soon as one single real number
> differs more than a specified tolerance to the corresponding
> number from the other file, .false. should be returned.
>
> I already succeeded in checking if fileA and fileB have the same
> number of columns and the same number of lines, and if that is
> not the case i can already return .false. But now comes the
> harder part... to compare all numbers... what would be the best
> approach for this?
>
> I tried something like
>
> equal = .true.
> do
> read(unit=unit_set1, fmt="(F)") nb1
> read(unit=unit_set2, fmt="(F)") nb2
> if (nb1 /= nb2) then
> equal = .false.
> exit
> end if
> if (ios /= 0) exit
> end do


> Warning (100): Nonnegative width required in format string at (1)
> In file mod_testing.f95:116


As a minor nitpick, I'd recommend adopting the convention of using
..f90 for free source form and .f for fixed form. Look at the clf
archives for some well thought out reasons why this is usually a
better way to think instead of using the extension to specify a
particular standard revision.

The standard itself obviously doesn't care about this, it's more about
how the tools (compilers, editors etc.) work, and how to achieve
portability by "going with the flow". E.g. last I checked the Intel
compiler doesn't understand the .f95 extension.

> What would be the best approach to compare the numbers from these two
> data-files and return .false. from the moment two of them differ by
> a certain specified tolerance?


If you haven't done so, start by reading "What every computer
scientist needs to know about floating point" by Goldberg (you can
find a copy online). After reading that you realize that you should
compare floats with something like

if (abs (nb1 - nb2) < tol) then
equal = .false.
...
end if

where tol is the tolerance you have chosen.

Also note that a read statement reads an entire line (unless you use
advance='no', or access='stream' or somesuch). You probably want to
compare all the dimensions, not only the first. So make nb1 and nb2
arrays of rank 1 and size at least as big as the number of dimensions
(e.g. by using allocatables).


--
Janne Blomqvist
David Frank

2005-09-13, 7:57 am


"Bart Vandewoestyne" <MyFirstName.MyLastName@telenet.be> wrote in message
news:1126599777.741441@seven.kulnet.kuleuven.ac.be...
>
> The problem is that i don't know the width in advance...
>


You are describing what list * input was designed to handle,
below hasnt been tested

! -----------------
program test
real(8) :: x(3), y(3), tol = 1.d-6
integer :: n, record

open (1,file='file1.dat')
open (2,file='file2.dat')

do record = 1,1000000
read (1,*,end=101) x
read (2,*) y
do n = 1,3
if (abs(x(n)-y(n)) > tol) then
write (*,*) 'tol exceeded, record = ', record
stop
end if
end do
end do
101 stop
end program


Thomas Koenig

2005-09-13, 6:59 pm

Bart Vandewoestyne <MyFirstName.MyLastName@telenet.be> wrote:
>I have data-files that look like this:
>
> 0.324 0.543 0.897
> 0.437 0.787 0.367
> 0.321 0.423 0.424
> 0.433 0.434 0.854
> ... and so on...


....

>Now i have 2 programs that generate such datafiles, and I want to
>compare the numbers in it. As soon as one single real number
>differs more than a specified tolerance to the corresponding
>number from the other file, .false. should be returned.


I usually use perl for this kind of task. In Fortran, I would
read in the lines into character variables and split it up into
whitespace-delimited fields (into an array of characters). These
fields of that character can be converted to reals using internal
reads with the width of the fields (trailing blanks won't do harm).
David Flower

2005-09-14, 7:57 am


David Frank wrote:
> "Bart Vandewoestyne" <MyFirstName.MyLastName@telenet.be> wrote in message
> news:1126599777.741441@seven.kulnet.kuleuven.ac.be...
>
> You are describing what list * input was designed to handle,
> below hasnt been tested
>
> ! -----------------
> program test
> real(8) :: x(3), y(3), tol = 1.d-6
> integer :: n, record
>
> open (1,file='file1.dat')
> open (2,file='file2.dat')
>
> do record = 1,1000000
> read (1,*,end=101) x
> read (2,*) y
> do n = 1,3
> if (abs(x(n)-y(n)) > tol) then
> write (*,*) 'tol exceeded, record = ', record
> stop
> end if
> end do
> end do
> 101 stop
> end program


Spot on, far simpler than any other proposed solution.

But there is a point that no one else has touched on - tolerances may
be absolute or relative - you have only specified absolute.

Relative would be:

ABS(X(N)-Y(N)) .GT. tol*MAX(ABS(X(N),ABS(Y(N)))

Dave Flower

David Flower

2005-09-14, 7:57 am


David Frank wrote:
> "Bart Vandewoestyne" <MyFirstName.MyLastName@telenet.be> wrote in message
> news:1126599777.741441@seven.kulnet.kuleuven.ac.be...
>
> You are describing what list * input was designed to handle,
> below hasnt been tested
>
> ! -----------------
> program test
> real(8) :: x(3), y(3), tol = 1.d-6
> integer :: n, record
>
> open (1,file='file1.dat')
> open (2,file='file2.dat')
>
> do record = 1,1000000
> read (1,*,end=101) x
> read (2,*) y
> do n = 1,3
> if (abs(x(n)-y(n)) > tol) then
> write (*,*) 'tol exceeded, record = ', record
> stop
> end if
> end do
> end do
> 101 stop
> end program


Spot on, far simpler than any other proposed solution.

But there is a point that no one else has touched on - tolerances may
be absolute or relative - you have only specified absolute.

Relative would be:

ABS(X(N)-Y(N)) .GT. tol*MAX(ABS(X(N),ABS(Y(N)))

Dave Flower

bricknews@gmail.com

2005-09-16, 7:57 am

I find ndiff (http://www.math.utah.edu/pub/misc) works well for me.

Cheers
Neilen

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com