For Programmers: Free Programming Magazines  


Home > Archive > Fortran > July 2006 > Need help with large file reading and writing.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Need help with large file reading and writing.
news.gatech.edu

2006-07-13, 6:59 pm


Hello Pals:

When I use fortran 90 code to read larger files,about 1G each, and
write the data to other opened files with 32-bit Intel-compiler. Each
time, I just opened 3 files simultaneously and closed unused ones. The
code always stops at the point of 10 files reading. I am not sure it is
the compiler problem about the limitation of buffer memory.

Thanks very much

Liscon


Simple code:

open(unit=18,file="aux.out")
do i=1,10
open(unit=101+i,file="fort.1+i") !old file; unit and file name just
for simplicity

open(unit=1001+i,file="fort.101+i") !New file

do j=1,EN ! number of lines wrote to the new file
read(101+i,format)n1,n2,n3,n4,n5,n6,n7
write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

end o

do j=EN+1,N ! number of lines left,N=total lines of old file
read(101+i,format)n1,n2,n3,n4,n5,n6,n7
write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

end do

close(101+i)
close(1001+i)

end do

Richard E Maine

2006-07-13, 6:59 pm

news.gatech.edu <liscon@gmail.com> wrote:

> When I use fortran 90 code to read larger files,about 1G each, and
> write the data to other opened files with 32-bit Intel-compiler. Each
> time, I just opened 3 files simultaneously and closed unused ones. The
> code always stops at the point of 10 files reading. I am not sure it is
> the compiler problem about the limitation of buffer memory.


[sample code elided. I don't think it is telling me much about the
problem. I saw some "issues" in the sample code, but I'd guess they were
all artifacts of the attempt at simplification and have nothing to do
with the question. For example, the file names were... well, not what
one might think they were intended to be.]

Not a lot of data to go on, but I doubt it has anthing to do with
"buffer memory." Compilers don't need to buffer the whole file or
anything like that. Large records might cause buffer size "issues", but
I don't see any suggetsion that your record sizes are particularly
large.

My first guess would be that the compiler and/or operating system (you
didn't say what operating system that I noticed; the Intel compiler runs
on several) don't support files as large as you are trying to write.

My second guess would be a memory leak somewhere. If so, simplified
illustrative code would not be enough to find it. The old MS
Powerstation 4 compiler used to have memory leaks in formatted I/O. I
wouldn't expect that out of Intel, but I can't rule it out.

P.S. You mention files of gigabyte size, and your example shows
formatted I/O. I assume you know that formatted I/O is *VERY*
inefficient, both in terms of file size and time. Maybe you have good
reason to use formatted I/O anyway; maybe you don't even have a choice.
And maybe performance is not an issue. But I thought I'd mention it just
in case.

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Terry

2006-07-14, 4:00 am

I am reading your post as meaning you open and close files one after
another and the program stops after doing this 10 times.

The MS operating system allows you to execute in a DOS environment and
in a Windows console environment (and others) but these two can both be
used for a DOS-target compilation. Both have a limit on the number of
files being open at the same time. 5 files are already taken internally
but are in the count of active data control blocks. IF and only IF this
is what you are doing you have to change the number of DCB file blocks
to be assigned ata program start. This is usually done in autoexec.nt
for NT and XP systems (in \WINNT\System32. or similar folder) and in
autoexec.bat in the disc root (C:\) in earlier Windows systems.

I had this problem with a general sort-merge routine where I sometimes
needed 32 work files. I had to write an NT-specific native Windows
program because NT comes with a fixed limit for DOS emulations.

I don't thinkyou have a file size problem, without more information
Terence Wright

Richard Maine

2006-07-14, 4:00 am

Terry <tbwright@cantv.net> wrote:

> IF and only IF this
> is what you are doing you have to change the number of DCB file blocks
> to be assigned ata program start.


The OP specifically mentioned f90. Do there exist any f90 compilers that
even can use old DOS DCB file blocks? I guess I don't know for sure that
there aren't, but my money goes on the side that says the problem lies
elsewhere. It has sure been a long time since I've had to pay any
attention to DCB file blocks. I've done it, but it has been a long time.
I sure don't recall it ever coming up in f90 compilers.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
glen herrmannsfeldt

2006-07-14, 4:00 am

Terry wrote:

> I am reading your post as meaning you open and close files one after
> another and the program stops after doing this 10 times.


> The MS operating system allows you to execute in a DOS environment and
> in a Windows console environment (and others) but these two can both be
> used for a DOS-target compilation. Both have a limit on the number of
> files being open at the same time. 5 files are already taken internally
> but are in the count of active data control blocks. IF and only IF this
> is what you are doing you have to change the number of DCB file blocks
> to be assigned ata program start.


If I remember for DOS there are DCB and FCBS parameters, which specify
the DCBs used by DOS 1.x programs, and FCBS used by programs for later
versions of DOS. Hopefully most compilers available now use newer
methods than DCB, which doesn't allow for a path including subdirectories.

-- glen

Herman D. Knoble

2006-07-14, 7:59 am

Liscom: Your program looks ok; others made suggestions on Formatted I/O
but this won't likely change the problem you're having. Suggest that you
check the following:

1) Check how much RAM and correspoinding virtual storage you have.
Many people suggest that virtual storage shoud be about 2.5 times RAM.
If that number is less than 3GB then you're suspicions are likely true.
(Note the Windows Control Panel System icon will show RAM size; typically
for the System Icon, the Performance button of the Advancde tab will show
Virtual Storage allocation. If you are running under Linux try the
following commands:
free -m
cat /proc/cpuinfo > t.t
cat /proc/meminfo > t.t

2) Make sure you're using the latest compiler version.

3) Try a different compiler; for example, G95: http://g95.sourceforge.net/

Good luck with it.

Skip Knoble

On Thu, 13 Jul 2006 17:20:20 -0400, "news.gatech.edu" <liscon@gmail.com> wrote:

-|
-|Hello Pals:
-|
-| When I use fortran 90 code to read larger files,about 1G each, and
-|write the data to other opened files with 32-bit Intel-compiler. Each
-|time, I just opened 3 files simultaneously and closed unused ones. The
-|code always stops at the point of 10 files reading. I am not sure it is
-|the compiler problem about the limitation of buffer memory.
-|
-|Thanks very much
-|
-|Liscon
-|
-|
-|Simple code:
-|
-| open(unit=18,file="aux.out")
-| do i=1,10
-| open(unit=101+i,file="fort.1+i") !old file; unit and file name just
-|for simplicity
-|
-| open(unit=1001+i,file="fort.101+i") !New file
-|
-| do j=1,EN ! number of lines wrote to the new file
-| read(101+i,format)n1,n2,n3,n4,n5,n6,n7
-| write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

-| end o
-|
-| do j=EN+1,N ! number of lines left,N=total lines of old file
-| read(101+i,format)n1,n2,n3,n4,n5,n6,n7
-| write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

-| end do
-|
-| close(101+i)
-| close(1001+i)
-|
-| end do

news.gatech.edu

2006-07-14, 7:00 pm


Thanks very much!
To clarify the problem:

Memory leak: I did use some allocatable matrice and forget to deallocate
them, but I deallocate them or just declare them as static after you
mentioned. The same problem happned. so maybe memory leak is not
possible. The only parameters used often in the code are 7 integer
variables.

Compilers: I use Intel Compiler(version 8.1) on Linux.

Also the code runs smoothly for 64 files when the files' size are small,
say 100M each one. And when I increase the file size to about 400M each,
it stops at the point of 32 files. Moreover, when the file size
increases to about 1G, it just handles the first 10 files.

It always stops about 800 sec.

And the error output are:
Exited with exit code 1.

Resource usage summary:

CPU time : 832.00 sec.
Max Memory : 2 MB
Max Swap : 19 MB

Max Processes : 1
Max Threads : 1

The output (if any) follows:

Timeout alarm signaled

TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
==== ========== ================ =======================
===================
0001 compute-6- mpirun_ssh -np 1 Killed by PAM (SIGTERM) 07/13/2006
23:08:10


About the Formatted and Unformatted I/O, I don't know much about the
efficiency. Thanks very much, I will try it right now.

Liscon


Richard E Maine wrote:

> news.gatech.edu <liscon@gmail.com> wrote:
>
>
>
>
> [sample code elided. I don't think it is telling me much about the
> problem. I saw some "issues" in the sample code, but I'd guess they were
> all artifacts of the attempt at simplification and have nothing to do
> with the question. For example, the file names were... well, not what
> one might think they were intended to be.]
>
> Not a lot of data to go on, but I doubt it has anthing to do with
> "buffer memory." Compilers don't need to buffer the whole file or
> anything like that. Large records might cause buffer size "issues", but
> I don't see any suggetsion that your record sizes are particularly
> large.
>
> My first guess would be that the compiler and/or operating system (you
> didn't say what operating system that I noticed; the Intel compiler runs
> on several) don't support files as large as you are trying to write.
>
> My second guess would be a memory leak somewhere. If so, simplified
> illustrative code would not be enough to find it. The old MS
> Powerstation 4 compiler used to have memory leaks in formatted I/O. I
> wouldn't expect that out of Intel, but I can't rule it out.
>
> P.S. You mention files of gigabyte size, and your example shows
> formatted I/O. I assume you know that formatted I/O is *VERY*
> inefficient, both in terms of file size and time. Maybe you have good
> reason to use formatted I/O anyway; maybe you don't even have a choice.
> And maybe performance is not an issue. But I thought I'd mention it just
> in case.
>


news.gatech.edu

2006-07-14, 7:00 pm




Thanks for new solutions.

For I am using Linux system, is there still DCB file block problem? If
so, how can change the parameters?

As Herman mentioned, I use the cammand "free -m" and get a output file
t.t with the information:

total: used: free: shared: buffers: cached:
Mem: 6238453760 6204731392 33722368 0 140128256 5121093632
Swap: 2147475456 1241088 2146234368
MemTotal: 6092240 kB
MemFree: 32932 kB
MemShared: 0 kB
Buffers: 136844 kB
Cached: 5000004 kB
SwapCached: 1064 kB
Active: 3127256 kB
ActiveAnon: 702984 kB
ActiveCache: 2424272 kB
Inact_dirty: 2085100 kB
Inact_laundry: 474264 kB
Inact_clean: 151596 kB
Inact_target: 1167640 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 6092240 kB
LowFree: 32932 kB
SwapTotal: 2097144 kB
SwapFree: 2095932 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 2048 kB

So which parameter should and how can I change it?


Thanks all you guys
Liscon





Herman D. Knoble wrote:

> Liscom: Your program looks ok; others made suggestions on Formatted I/O
> but this won't likely change the problem you're having. Suggest that you
> check the following:
>
> 1) Check how much RAM and correspoinding virtual storage you have.
> Many people suggest that virtual storage shoud be about 2.5 times RAM.
> If that number is less than 3GB then you're suspicions are likely true.
> (Note the Windows Control Panel System icon will show RAM size; typically
> for the System Icon, the Performance button of the Advancde tab will show
> Virtual Storage allocation. If you are running under Linux try the
> following commands:
> free -m
> cat /proc/cpuinfo > t.t
> cat /proc/meminfo > t.t
>
> 2) Make sure you're using the latest compiler version.
>
> 3) Try a different compiler; for example, G95: http://g95.sourceforge.net/
>
> Good luck with it.
>
> Skip Knoble
>
> On Thu, 13 Jul 2006 17:20:20 -0400, "news.gatech.edu" <liscon@gmail.com> wrote:
>
> -|
> -|Hello Pals:
> -|
> -| When I use fortran 90 code to read larger files,about 1G each, and
> -|write the data to other opened files with 32-bit Intel-compiler. Each
> -|time, I just opened 3 files simultaneously and closed unused ones. The
> -|code always stops at the point of 10 files reading. I am not sure it is
> -|the compiler problem about the limitation of buffer memory.
> -|
> -|Thanks very much
> -|
> -|Liscon
> -|
> -|
> -|Simple code:
> -|
> -| open(unit=18,file="aux.out")
> -| do i=1,10
> -| open(unit=101+i,file="fort.1+i") !old file; unit and file name just
> -|for simplicity
> -|
> -| open(unit=1001+i,file="fort.101+i") !New file
> -|
> -| do j=1,EN ! number of lines wrote to the new file
> -| read(101+i,format)n1,n2,n3,n4,n5,n6,n7
> -| write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

> -| end o
> -|
> -| do j=EN+1,N ! number of lines left,N=total lines of old file
> -| read(101+i,format)n1,n2,n3,n4,n5,n6,n7
> -| write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

> -| end do
> -|
> -| close(101+i)
> -| close(1001+i)
> -|
> -| end do
>


Richard E Maine

2006-07-14, 7:00 pm

news.gatech.edu <liscon@gmail.com> wrote:


> For I am using Linux system, is there still DCB file block problem?


No. That one I know for sure. DCBs are purely a DOS thing - ancient
versions of DOS at that. I don't think there are even DCB issues with
with any f90 compilers on Windows, but that part I could be wrong on. No
way that DCBs have anything to do with Linux.

I'm afraid I don't have any further constructive suggestions other than
the possibilities I mentioned before. Your numbers from another posting
all sure seem to indicate that you stop at about the same total size. I
have a hard time imagining that this is a coincidence. It would, of
course, also correspond to processing about the same total number of
records, and thus, for example, the same number of repetitions of
anything that might leak memory. Many things could be correlated with
the file size, but your numbers pretty convincingly support my thesis
that it is likely not related to the number of files. I didn't think it
was anyway, so maybe I'm just seeing my prior prejudices in the data -
but I don't think so.

(It should be "obvious" and I'd guess it was not your problem, but I've
seen it misssed before, so just in case... you aren't running out of
disk space are you?)

--
Richard Maine | Good judgment comes from experience;
email: my first.last at org.domain| experience comes from bad judgment.
org: nasa, domain: gov | -- Mark Twain
Herman D. Knoble

2006-07-14, 7:00 pm

Liscon: It looks like you have plenty of memory for buffers.

Can you upgrade to Intel Compiler v9.1?
Or can you try G95 for Linux?
Or try the 64-bit Intel compiler? (If you have integer counters,
and you ar counting past the 2GB line (2**31-1) then you would
need to declare such integer counters Integer*8)

Skip


On Fri, 14 Jul 2006 10:36:04 -0400, "news.gatech.edu" <liscon@gmail.com> wrote:

-|
-|
-|
-|Thanks for new solutions.
-|
-|For I am using Linux system, is there still DCB file block problem? If
-|so, how can change the parameters?
-|
-|As Herman mentioned, I use the cammand "free -m" and get a output file
-|t.t with the information:
-|
-|total: used: free: shared: buffers: cached:
-|Mem: 6238453760 6204731392 33722368 0 140128256 5121093632
-|Swap: 2147475456 1241088 2146234368
-|MemTotal: 6092240 kB
-|MemFree: 32932 kB
-|MemShared: 0 kB
-|Buffers: 136844 kB
-|Cached: 5000004 kB
-|SwapCached: 1064 kB
-|Active: 3127256 kB
-|ActiveAnon: 702984 kB
-|ActiveCache: 2424272 kB
-|Inact_dirty: 2085100 kB
-|Inact_laundry: 474264 kB
-|Inact_clean: 151596 kB
-|Inact_target: 1167640 kB
-|HighTotal: 0 kB
-|HighFree: 0 kB
-|LowTotal: 6092240 kB
-|LowFree: 32932 kB
-|SwapTotal: 2097144 kB
-|SwapFree: 2095932 kB
-|HugePages_Total: 0
-|HugePages_Free: 0
-|Hugepagesize: 2048 kB
-|
-|So which parameter should and how can I change it?
-|
-|
-|Thanks all you guys
-|Liscon
-|
-|
-|
-|
-|
-|Herman D. Knoble wrote:
-|
-|> Liscom: Your program looks ok; others made suggestions on Formatted I/O
-|> but this won't likely change the problem you're having. Suggest that you
-|> check the following:
-|>
-|> 1) Check how much RAM and correspoinding virtual storage you have.
-|> Many people suggest that virtual storage shoud be about 2.5 times RAM.
-|> If that number is less than 3GB then you're suspicions are likely true.
-|> (Note the Windows Control Panel System icon will show RAM size; typically
-|> for the System Icon, the Performance button of the Advancde tab will show
-|> Virtual Storage allocation. If you are running under Linux try the
-|> following commands:
-|> free -m
-|> cat /proc/cpuinfo > t.t
-|> cat /proc/meminfo > t.t
-|>
-|> 2) Make sure you're using the latest compiler version.
-|>
-|> 3) Try a different compiler; for example, G95: http://g95.sourceforge.net/
-|>
-|> Good luck with it.
-|>
-|> Skip Knoble
-|>
-|> On Thu, 13 Jul 2006 17:20:20 -0400, "news.gatech.edu" <liscon@gmail.com> wrote:
-|>
-|> -|
-|> -|Hello Pals:
-|> -|
-|> -| When I use fortran 90 code to read larger files,about 1G each, and
-|> -|write the data to other opened files with 32-bit Intel-compiler. Each
-|> -|time, I just opened 3 files simultaneously and closed unused ones. The
-|> -|code always stops at the point of 10 files reading. I am not sure it is
-|> -|the compiler problem about the limitation of buffer memory.
-|> -|
-|> -|Thanks very much
-|> -|
-|> -|Liscon
-|> -|
-|> -|
-|> -|Simple code:
-|> -|
-|> -| open(unit=18,file="aux.out")
-|> -| do i=1,10
-|> -| open(unit=101+i,file="fort.1+i") !old file; unit and file name just
-|> -|for simplicity
-|> -|
-|> -| open(unit=1001+i,file="fort.101+i") !New file
-|> -|
-|> -| do j=1,EN ! number of lines wrote to the new file
-|> -| read(101+i,format)n1,n2,n3,n4,n5,n6,n7
-|> -| write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

-|> -| end o
-|> -|
-|> -| do j=EN+1,N ! number of lines left,N=total lines of old file
-|> -| read(101+i,format)n1,n2,n3,n4,n5,n6,n7
-|> -| write(1001+i,format)n1,n2,n3,n4,n5,n6,n7

-|> -| end do
-|> -|
-|> -| close(101+i)
-|> -| close(1001+i)
-|> -|
-|> -| end do
-|>

Terry

2006-07-14, 9:59 pm

Now I am biased towards running out of disc space as the problem under
discussion, if 400k files stop after 32 files being opened, while with
double that size, it stops after 10. If there was a pure file
count/memory problem (counts of Windows file buffer and control
blocks), it would be always at 10 (say); so YES, check for disc space
availability.

And yes, DCB's (Data Control Block) counts ARE an issue (also known as
FCB File Control Blocks) in all MSDOS from 1.x up to the most recent
issue of DOS (v7, I think) and those DOS emulators still present in NT,
XP and MAC, and important if you compile and TARGET for a DOS O/S
instead of Windows, as you can with MS Professional Fortran. (I use F77
whenever possible, else use CVF 6.6 for pure Windows if I need port I/O
which is not now supported in the DOS emulator). But you can use long
names, because DOS only uses the control part of the DCB after locating
the file from the specified full name; you won't find all the file name
in the DCB anymore.

Terence Wright

Kevin G. Rhoads

2006-07-17, 7:00 pm

>And yes, DCB's (Data Control Block) counts ARE an issue (also known as
>FCB File Control Blocks) in all MSDOS from 1.x up to the most recent
>issue of DOS (v7, I think) and those DOS emulators still present in NT,
>XP and MAC, and important if you compile and TARGET for a DOS O/S
>instead of Windows, as you can with MS Professional Fortran.


This is bogus. FCBs are only an issue if you target DOS 1.x. The last version
of MS FOrtran which COULD target DOS 1.x was version 3.2, which was subset F77,
and 16 bit only. All later versions of MS Fortran use file handles not FCBs,
so FCB issues are irrelevant. Even v3.2 could target DOS 2.0, and thus use file
handles, it only offered DOS 1.x for backward compatibility.

Yes, even later versions of DOS and DOS emulators will have issues if FCBs
are used, but unless OP has a DOS 1.x targeting F90 compiler from some alternate
universe, which I highly doubt, and probably wouldn't be running under Linux
anyway, and couldn't address 1G files even if he had such a chimeric fantasy ...
well, it is just bogus.
Terence

2006-07-19, 7:01 pm

Kevin G. Rhoads wrote:
(About DCB blocks in config.sys)

> This is bogus. FCBs are only an issue if you target DOS 1.x. The last version
> of MS FOrtran which COULD target DOS 1.x was version 3.2, which was subset >F77, and 16 bit only. All later versions of MS Fortran use file handles not FCBs,
> so FCB issues are irrelevant. Even v3.2 could target DOS 2.0, and thus use file
> handles, it only offered DOS 1.x for backward compatibility.
>
> Yes, even later versions of DOS and DOS emulators will have issues if FCBs
> are used, but unless OP has a DOS 1.x targeting F90 compiler from some
> alternate universe, which I highly doubt, and probably wouldn't be running under
> Linux anyway, and couldn't address 1G files even if he had such a chimeric
> fantasy . well, it is just bogus.


Kevin is NOT correct.
The count and use of DCB's are an issue in for-DOS compilations.

We still commercially use MS Fortran V3.31 of 1985 (and also CVF 6.6
and Intel V9 compilers), which is OK for all current MSDOS systems and
Windows emulators (include MAC). It is also OK for all console mode
compileations that don't use the comunication ports (now not fully
supported by XP in some frame and parity combinations).

We have two major commercial systems with one F90 compiled
user-interface F77 program each, that then call 60 or more
dos-targetted programs quite correctly. This way we have one large
(bloated) compilation of F77 code for a Windows user interface, and
many scores of perfectly OK ,small, (30k-100k) executables that get
called as child processes to do the needed work and which can be
updated with new features and distributed world-wide by e-mail. This
method avoids recompiling about 200 modules into a single program for
every update. (and a huge problem in extensive distribution of
umpty-megabytes of executables). All programs are thus separate
entities

We had a problem last year with XP user clients not being able to sort
very large files (with our general-purpose sort/merge subroutine)
because it ran out of DCB blocks at 20, when 32 files were needed.

It turned out XP only provides 20 blocks in the DOS emulator and
disregards the CONFIG.SYS (or .NT) instruction to increase to 32 that
we had supplied for the software installation.
So we wrote a new F90-compatible version with no such limitations.

I think commentors should bear in mind that every other contributor is
speaking from actual experience, so that, if there is a difference of
opinion, perhaps the facts are being misinterpreted. Words like "bogus"
should be avoided.

Terence Wright

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com