For Programmers: Free Programming Magazines  


Home > Archive > Matlab > August 2005 > loading data from ascii file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author loading data from ascii file
Felix

2005-08-30, 7:03 pm

Hi!
I have to read data from an ascii file which is several GB large.
lines represent individual datapoints. I would like to read only some
lines, the linenumbers are stored in a matlab vector. is there any
possibilty to this without loading the whole array (which is too)
big.
thanks for any suggestions!
Felix.
PB

2005-08-30, 7:03 pm

Felix wrote:
>
>
> Hi!
> I have to read data from an ascii file which is several GB large.
> lines represent individual datapoints. I would like to read only
> some
> lines, the linenumbers are stored in a matlab vector. is there any
> possibilty to this without loading the whole array (which is too)
> big.
> thanks for any suggestions!
> Felix.


Hi Felix!

I havenīt had this problem myself, but the memory mapping in R14
might help? :

<http://www.mathworks.com/access/hel...g/ch10_in9.html>

/PB
felix

2005-08-30, 7:03 pm

Hi PB!
I havnīt tried it yet, but it looks promising!
Thanks for the hint,
felix.
Peter Boettcher

2005-08-31, 7:01 pm

Felix <feboll@gmx.de> writes:

> Hi!
> I have to read data from an ascii file which is several GB large.
> lines represent individual datapoints. I would like to read only some
> lines, the linenumbers are stored in a matlab vector. is there any
> possibilty to this without loading the whole array (which is too)
> big.


Random access in an ascii file is very difficult, if not impossible,
as the program can never know how many linefeeds exist between any two
byte offsets, without reading them all.

I would suggest converting the whole mess to a binary file, ONCE.
Sing around a binary file is straightforward, and faster too.

I would use a block-processing approach to converting the file. Read
one line, or a bunch of lines (100? 1000?) at once, using fscanf or
fgetl. Convert, then fwrite the output to your output file. That
way, only a small amount of the file is in memory at any time.

Then, for random access, fs to the byte offset you need. (row
number x row size x element size).



--
Peter Boettcher <boettcher@ll.mit.edu>
MIT Lincoln Laboratory
MATLAB FAQ: http://www.mit.edu/~pwb/cssm/
Felix

2005-08-31, 7:01 pm

Hi!
That should work-I think the guys at mathworks did exactly this when
implementing the memmapfile object-above, PB gave me the hint. it
looks very usefull!!
Regards,

Felix.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com