For Programmers: Free Programming Magazines  


Home > Archive > Fortran > December 2005 > writing at the end of the lines









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author writing at the end of the lines
Kamaraju Kusumanchi

2005-11-25, 3:58 am

Let's say I have file whose contents are

1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16

I want to add a column at the end so that the above file becomes

1 2 3 4 17
5 6 7 8 18
9 10 11 12 19
13 14 15 16 20

How can I do this efficiently in Fortran 90? I am thinking of reading
each line to a temporary string and appending to this string the desired
value and printing the new string again. But this method seems to be too
brute force and is probably is very inefficient if I have a large number
of columns (say 1000) and want to do this say 500 times. Is there any
other efficient way to accomplish this ( ie appending an extra column
without reading the entire line and thus the whole file)?

thanks
raju


--
Kamaraju S Kusumanchi
http://www.people.cornell.edu/pages/kk288/
http://malayamaarutham.blogspot.com/
Ian Bush

2005-11-25, 3:58 am

Kamaraju Kusumanchi wrote:

> Let's say I have file whose contents are
>
> 1 2 3 4
> 5 6 7 8
> 9 10 11 12
> 13 14 15 16
>
> I want to add a column at the end so that the above file becomes
>
> 1 2 3 4 17
> 5 6 7 8 18
> 9 10 11 12 19
> 13 14 15 16 20
>
> How can I do this efficiently in Fortran 90? I am thinking of reading
> each line to a temporary string and appending to this string the desired
> value and printing the new string again.


This, or variants of it, is the only way I can think of doing this in Fortran.
You might get a bit less dismal performance reading a number of lines, appending
and then writing them all out, but that's about it. However ...

> if I have a large number
> of columns (say 1000) and want to do this say 500 times.


Not totally sure I understand what you want, but is the file to big to hold
all in memory, append all that you need, and then write out again. Or failing
that do more or less the above, read in in big chunks, do all the appending you
require, and then write out ?

Ian


Richard Maine

2005-11-25, 3:58 am

Kamaraju Kusumanchi <kk288@cornell.edu> wrote:

> I want to add a column at the end...

...
> How can I do this efficiently in Fortran 90?...
> Is there any
> other efficient way to accomplish this ( ie appending an extra column
> without reading the entire line and thus the whole file)?


This isn't really language-related at all. The simplest answer is that
it isn't possible without reading, and for that matter, rewriting the
whole file. You don't need to think about Fortran. Just look at what you
want in the file. You can't just insert things in the middle of a file.
Well, not a simple sequential text file anyway.

Making this more efficient would probably involve reorganizing how the
file was stored. One could probably do this lots of ways, depending on
requirements. The most "obvious" reorgnization is to transpose it so
that the rows become columns, and vise versa. Appending a line (row) to
a file is a much simpler matter. Whether such a reorganization would
mess up everything else or not, I can't judge. You might be able to
reorganize the file like that temporarily, and then swap it back later,
after the appending, if that was a problem.

Or read the whole file into memory and do the operations in memory. If
that works, it will probably be hugely more efficient than fiddling with
the file.

Mostly, this isn't a programming language problem. It is a data
organization and algorithm problem. Come up with the data organization
and algorithm first. Then worry about how to do that in Fortran.


--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
Ron Shepard

2005-11-25, 3:58 am

In article <dm69t5$7va$1@ruby.cit.cornell.edu>,
Kamaraju Kusumanchi <kk288@cornell.edu> wrote:

> How can I do this efficiently in Fortran 90?


This really isn't something specific to the language, appending data
to lines within a file is a relatively difficult thing to do. If
you control the format of the data, then one possibility might be to
transpose rows and columns in the file; this way you would be adding
new lines to the file, which is relatively easy to do. Another
possibility might be to include not only the value, but also its row
and column index in the file; this way you can add either rows or
columns as necessary by appending the new entries to the end of the
file.

$.02 -Ron Shepard
Kamaraju Kusumanchi

2005-11-25, 7:57 am

Richard Maine wrote:
> Kamaraju Kusumanchi <kk288@cornell.edu> wrote:
>
>
>
> ..
>
>
>
> This isn't really language-related at all. The simplest answer is that
> it isn't possible without reading, and for that matter, rewriting the
> whole file. You don't need to think about Fortran. Just look at what you
> want in the file. You can't just insert things in the middle of a file.
> Well, not a simple sequential text file anyway.
>
> Making this more efficient would probably involve reorganizing how the
> file was stored. One could probably do this lots of ways, depending on
> requirements. The most "obvious" reorgnization is to transpose it so
> that the rows become columns, and vise versa. Appending a line (row) to
> a file is a much simpler matter. Whether such a reorganization would
> mess up everything else or not, I can't judge. You might be able to
> reorganize the file like that temporarily, and then swap it back later,
> after the appending, if that was a problem.
>


First of all, Thank you all for the replies.

Actually I have been through this path. This is my original situation.
Imagine I am writing a program which gives the value of f(x,t) at a
given time step t_i. Let there be m grid points in x direction and the
idea is to see how f(x,t) evolves over time in n time steps.

Now due to the size of the arrays and the large number of time steps
required etc., it is not an option to store all the data (which would
require storage of m*n data points). So at any given point of time I
store only the x array (which will be m data points).

When I write the output of this program on to a file the output looks
something like this:

time x_1 x_2 x_3 x_4 ..... x_m
t_0
t_1
t_2
Richard Maine

2005-11-25, 7:00 pm

Only one small comment I have to add.

Kamaraju Kusumanchi <kk288@cornell.edu> wrote:

> Now due to the size of the arrays and the large number of time steps
> required etc., it is not an option to store all the data


I urge you to think a little more along this line to make sure that you
have accurately evaluated whether this is an option. Memory tends to be
larger than it used to be (by quite a lot), and virtual memory helps
too. I have seen several people spend a lot of time worrying about
amounts of memory that aren't worth the trouble. While one certainly
does want to think about resource requirements, one also needs to
have a realistic evaluation of teh cost of the resource versus, for
example, the cost of programmer time.

You mentioned what I interpreted as on the order of 1000 points of x. If
you have only about 1000 points of time, that would be only an array of
about a million elements. You could plausibly go quite a bit larger than
that without running into a lot of trouble on todays machines.

It may be that you are correct that it isn't an option for you, but I
just wanted to make sure that you hadn't dismissed the possibility too
rapidly, without thinking through the numbers - I've seen people do
that.

And if you can't store the whole file, another possibility is to block
the problem. Store as big a chunk as you reasonably can, which
would certainly be a lot more than one time frame, even if it is less
than all of them.

....
> When I write the output of this program on to a file the output looks
> something like this:
>
> time x_1 x_2 x_3 x_4 ..... x_m
> t_0
> t_1
> t_2
> .
> .


That's real typical organization for time series data.

--
Richard Maine | Good judgement comes from experience;
email: last name at domain . net | experience comes from bad judgement.
domain: summertriangle | -- Mark Twain
glen herrmannsfeldt

2005-11-25, 7:00 pm

Kamaraju Kusumanchi wrote:
> Richard Maine wrote:
[color=darkred]
[color=darkred]
[color=darkred]

Some time ago someone wanted a file that could have data added
(prepended) to the beginning, similar to the way one can add data at the
end. It would not be so hard to design a file system like that, but
with low demand, as far as I know it hasn't been done.

In this case, the problem is not so uncommon, and is commonly
implemented in database systems such as PostGreSQL and MySQL, to name
two of the freely available systems. Most likely the implementation
will store each data column as a separate sequential data structure
making it fairly easy to add columns, though less convenient for
sequential reading of the entire file.

(snip regarding transposition to rows instead of columns)
[color=darkred]
> First of all, Thank you all for the replies.


> Actually I have been through this path. This is my original situation.
> Imagine I am writing a program which gives the value of f(x,t) at a
> given time step t_i. Let there be m grid points in x direction and the
> idea is to see how f(x,t) evolves over time in n time steps.


> Now due to the size of the arrays and the large number of time steps
> required etc., it is not an option to store all the data (which would
> require storage of m*n data points). So at any given point of time I
> store only the x array (which will be m data points).


If the array is really too large for the real memory available on your
machine then you are stuck. As Richard suggests later, it might not be.

Personally, I am against programs that unnecessarily read in a whole
file when the data is processed sequentially, but in this case it seems
like the right thing to do.

> When I write the output of this program on to a file the output looks
> something like this:


(snip of data with x across the page and t down the page)

> where I append the data at the new time step to the end of the file.


> But the problem with above approach is that most of the plotting
> softwares I tried (gnuplot, labplot etc.,) plot only across columns and
> cannot plot across rows (I am using Debian Sid if it matters). Due to
> this weird limitation of the plotting softwares I cannot plot the
> evolution of f(x,t) vs x over time without any external processing.


> For example if I am able to write my data as


(snip of transposed data)

> then it is trivial to plot evolution of f(x,t) vs x for different times
> using gnuplot without the need of external processing of data. Now I
> really would like to eliminate doing something extra beside the fortran
> program and that is why I asked the original question.


Well, that might be because that is the way most people want to look at
their data. Many systems have a record length limitation, and those
that don't make it inconvenient to look at data with long records.
Printers make it easy to print data that is 80 characters across and
millions of lines long, but not the other way around.

It is fairly easy to write a plotting program to plot data from columns,
reading the file a line at a time while ignoring the columns not needed.
That works pretty much independent of the length of the file.

It is not so hard to read a file and plot by rows, for a small number of
columns. This would tend to require the reading program to store all
the rows being plotted in memory.

> I am sure some of you must have faced a similar problem. What do you
> guys do? If there is a plotting program which is sane enough to
> recognize that plotting across rows is as useful as plotting across
> columns, I would like to use it.


I would always write the file with the potentially unlimited direction,
I believe t in your case, as the number of rows. It may, then, require
an external program to convert, but usually that program will supply
features that are needed for the particular data, and not generally
available in plotting programs. If the data set is extremely large, it
may require multiple passes through the file, or storing data in a
temporary file.

Whenever there is too much data to fit into memory, methods using disk
ask temporary storage are needed. That can be done with virtual memory,
writing temporary files, or multiple passes through the source file.

Database systems tend to block data into convenient size blocks, maybe
about 4K bytes, and read/write those blocks as a direct access file.

-- glen

Janne Blomqvist

2005-11-26, 7:56 am

In article <-I-dnTd6J-VLwhreRVn-uw@comcast.com>, glen herrmannsfeldt wrote:
> Some time ago someone wanted a file that could have data added
> (prepended) to the beginning, similar to the way one can add data at the
> end. It would not be so hard to design a file system like that, but
> with low demand, as far as I know it hasn't been done.
> In this case, the problem is not so uncommon, and is commonly
> implemented in database systems such as PostGreSQL and MySQL, to name
> two of the freely available systems.


No, it's not implemented in relational database systems, since one of
the fundamental properties of a RDBMS is that tuples (rows) are
unordered. If you want to impose some specific order on your result
set, you must instruct the database to return the results of your
query in some order (the ORDER BY clause in SQL IIRC). Depending on
whether the relevant column in the table is indexed or not, the
database might need to sort the result set before returning it.

> Most likely the implementation
> will store each data column as a separate sequential data structure
> making it fairly easy to add columns, though less convenient for
> sequential reading of the entire file.


Again, no. All the major RDMS's store data row-wise since that results
in much better performance for the vast majority of usage scenarios.

For postgresql you can see the on-disk data layout in the manual at
http://candle.pha.pa.us/main/writin...ml/storage.html

As for adding columns, it works since IIRC the value for the column
for all existing rows is NULL, so the database engine can understand
that if the data is missing from the tuple, it means a tuple where the
new column is NULL. When you alter such a row, it will be rewritten in
another place (where space is available) with the new column(s), and
the old tuple is marked invalid. Similar to supporting variable length
columns.

> Database systems tend to block data into convenient size blocks, maybe
> about 4K bytes, and read/write those blocks as a direct access file.


At least you got that one right. 1 out of 3, congratulations. ;-)


--
Janne Blomqvist
Janne Blomqvist

2005-11-26, 7:56 am

In article <dm745o$4dl$1@ruby.cit.cornell.edu>, Kamaraju Kusumanchi wrote:
> But the problem with above approach is that most of the plotting
> softwares I tried (gnuplot, labplot etc.,) plot only across columns and
> cannot plot across rows (I am using Debian Sid if it matters). Due to
> this weird limitation of the plotting softwares I cannot plot the
> evolution of f(x,t) vs x over time without any external processing.


> I am sure some of you must have faced a similar problem. What do you
> guys do? If there is a plotting program which is sane enough to
> recognize that plotting across rows is as useful as plotting across
> columns, I would like to use it.


I recommend R ( http://www.r-project.org/ ). It's actually a
statistics package, but IMHO it has the best plotting of all the
solutions I have tried. The syntax is a bit weird though. As you're
using debian, R is available as package "r-base" (plus a lot of
additional packages with "r-*" names),

Then there's of course matlab and its free "clones", octave and scilab
(packages "octave*" and "scilab*" in debian).

All of the solutions above deal with data as arrays, so you can plot
row-wise or column-wise depending on how you choose your array
slices. foo(:, col) vs. foo(row, :) just like F90+.


--
Janne Blomqvist
glen herrmannsfeldt

2005-11-26, 6:58 pm

Janne Blomqvist wrote:

(snip)

> No, it's not implemented in relational database systems, since one of
> the fundamental properties of a RDBMS is that tuples (rows) are
> unordered. If you want to impose some specific order on your result
> set, you must instruct the database to return the results of your
> query in some order (the ORDER BY clause in SQL IIRC). Depending on
> whether the relevant column in the table is indexed or not, the
> database might need to sort the result set before returning it.


There are two questions here. One is the physical structure of the file
and one is the logical structure. The OP wants a system with a logical
structure where columns can be added, and RDBMS supply that. They tend
to make no guarantee on the efficiency of almost anything you do, though.

[color=darkred]
> Again, no. All the major RDMS's store data row-wise since that results
> in much better performance for the vast majority of usage scenarios.


> For postgresql you can see the on-disk data layout in the manual at
> http://candle.pha.pa.us/main/writin...ml/storage.html


Thanks, I didn't know about that one.

> As for adding columns, it works since IIRC the value for the column
> for all existing rows is NULL, so the database engine can understand
> that if the data is missing from the tuple, it means a tuple where the
> new column is NULL. When you alter such a row, it will be rewritten in
> another place (where space is available) with the new column(s), and
> the old tuple is marked invalid. Similar to supporting variable length
> columns.


Certainly better than shifting the rest of the data down when changing
one row.

-- glen

Dave Thompson

2005-12-05, 3:57 am

On Fri, 25 Nov 2005 11:02:46 -0800, glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:
<snip>
> Some time ago someone wanted a file that could have data added
> (prepended) to the beginning, similar to the way one can add data at the
> end. It would not be so hard to design a file system like that, but
> with low demand, as far as I know it hasn't been done.


I worked on a proprietary and now vanished system that did. It
supported files where you could write and _destructively_ read (read
and remove) at either beginning or end. This was designed and used to
implement TECO-buffer-style editing on a low-end machine for the time
(LSI-11 with 48KB): open input file at the beginning and empty temp
file at the end; read from beginning of input into screen buffer;
scroll down by writing buffer to end of temp and reading more from
beginning of input; scroll up by writing to beginning of input and
reading from end of temp; finish by copying rest of input to end of
temp (and renaming temp in place of now-empty input in the directory).

FWLIW.

- David.Thompson1 at worldnet.att.net
glen herrmannsfeldt

2005-12-05, 8:08 am

Janne Blomqvist wrote:

(snip regarding data formats of relational database systems)

> No, it's not implemented in relational database systems, since one of
> the fundamental properties of a RDBMS is that tuples (rows) are
> unordered. If you want to impose some specific order on your result
> set, you must instruct the database to return the results of your
> query in some order (the ORDER BY clause in SQL IIRC). Depending on
> whether the relevant column in the table is indexed or not, the
> database might need to sort the result set before returning it.


(snip)

> Again, no. All the major RDMS's store data row-wise since that results
> in much better performance for the vast majority of usage scenarios.


It seems that as memory gets faster, and disks not so much faster, the
performance point is moving toward columnwise storage. See:

http://columbia.edu/~kar/pubsk/simd.pdf

for example. In any case, it is up to the RDBMS system to find an
optimal way of processing a given data set. Some do better than others.
I was surprised to find that for

select count(*) from table;

PostgreSQL actually counts the rows, as it doesn't seem to know the
answer without counting. Others I believe don't have to count.

-- glen

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2010 codecomments.com