For Programmers: Free Programming Magazines  


Home > Archive > Compression > July 2007 > seek to the end of file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author seek to the end of file
Joe

2007-07-26, 3:55 am

Hi,

Using the C/C++ zlib APIs, I would like to go to the end of the file.

Unfortunately, the zlib doesn't support SEEK_END in the gzs() API.

Can you please let me know if you have any alternatives?

TIA, Joe

Mark Adler

2007-07-27, 6:56 pm

On Jul 25, 11:33 pm, Joe <divya_...@yahoo.com> wrote:
> Unfortunately, the zlib doesn't support SEEK_END in the gzs() API.
>
> Can you please let me know if you have any alternatives?


You can s to the end using the usual file functions, and then pass
the descriptor to zlib using gzdopen().

Mark

Joe

2007-07-30, 7:56 am

On Jul 28, 12:59 am, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jul 25, 11:33 pm, Joe <divya_...@yahoo.com> wrote:
>
>
>
> You can s to the end using the usual file functions, and then pass
> the descriptor to zlib using gzdopen().
>
> Mark


I'm reading a zip file. So can I use the usual file functions like
fs() to s to the end of file?


Mark Adler

2007-07-30, 6:56 pm

On Jul 30, 4:41 am, Joe <divya_...@yahoo.com> wrote:
> I'm reading a zip file. So can I use the usual file functions like
> fs() to s to the end of file?


zlib does not directly support the zip format.

Mark

Hans-Peter Diettrich

2007-07-30, 6:56 pm

Joe wrote:

> I'm reading a zip file. So can I use the usual file functions like
> fs() to s to the end of file?


Such functions have no restrictions with regards to the file type. But I
assume that you want to do something more specific to an file type?

BTW, AFAIR is zlib not applicable to zip files.

At the end of an zip file resides the file comment, preceeded by the
global file directory. Thus sing to the end of an zip file is of very
little practical use.

Moves within an compressed file will require a decompression of the
file, up to the given position. Depending on the compression format, it
may be possible to skip over full compressed blocks, without an
according decompression. But don't confuse that virtual file position
with the physical file position.

DoDi
Joe

2007-07-31, 3:56 am

On Jul 30, 7:18 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
> On Jul 30, 4:41 am, Joe <divya_...@yahoo.com> wrote:
>
>
> zlib does not directly support the zip format.
>
> Mark


sorry. I meant a gzip file not the WIN zip. Can I use the fs() on
this gzip file to s to the end of file?

Thanks,

Joe

2007-07-31, 3:56 am

On Jul 30, 7:11 pm, Hans-Peter Diettrich <DrDiettri...@aol.com> wrote:
> Joe wrote:
>
> Such functions have no restrictions with regards to the file type. But I
> assume that you want to do something more specific to an file type?
>
> BTW, AFAIR is zlib not applicable to zip files.
>
> At the end of an zip file resides the file comment, preceeded by the
> global file directory. Thus sing to the end of an zip file is of very
> little practical use.
>
> Moves within an compressed file will require a decompression of the
> file, up to the given position. Depending on the compression format, it
> may be possible to skip over full compressed blocks, without an
> according decompression. But don't confuse that virtual file position
> with the physical file position.
>
> DoDi


I would like to find out the size of the gz file using zlib API calls.
For this, I was thinking of calling gzs () with SEEK_END and then
use gztell(). But unfortunately SEEK_END is not supported. Is there a
way to overcome this?

Thanks,

Hans-Peter Diettrich

2007-07-31, 3:56 am

Joe wrote:

> I would like to find out the size of the gz file using zlib API calls.
> For this, I was thinking of calling gzs () with SEEK_END and then
> use gztell(). But unfortunately SEEK_END is not supported. Is there a
> way to overcome this?


You'll have to gzread until the end of the stream is reached, summing up
the number of bytes read.

As I understand the zlib, it's a stream format, so that the size of the
stream can not be known before the sender, i.e. inflate(), stops to
provide further data.

DoDi
Joe

2007-07-31, 3:56 am

On Jul 31, 11:35 am, Hans-Peter Diettrich <DrDiettri...@aol.com>
wrote:
> Joe wrote:
>
> You'll have to gzread until the end of the stream is reached, summing up
> the number of bytes read.
>
> As I understand the zlib, it's a stream format, so that the size of the
> stream can not be known before the sender, i.e. inflate(), stops to
> provide further data.
>
> DoDi


gzs returns the resulting offset location in bytes from the
beginning of the uncompressed stream. Even if I get to the end of the
uncompressed stream, I'm fine with that.

Reading until the end of the stream using gzread would be extremely
slow. So I think there should be a better way.

Thanks

Mark Adler

2007-07-31, 3:56 am

On Jul 31, 12:46 am, Joe <divya_...@yahoo.com> wrote:
> Reading until the end of the stream using gzread would be extremely
> slow. So I think there should be a better way.


There isn't a better way that is completely reliable. To really know,
you need to decompress all of the data.

There is however a better way that works most of the time. That would
be to read (using the normal file functions) the last four bytes of
the gzip file, and interpret those as a little-endian 32-bit unsigned
integer. That is most likely the uncompressed size of the data in the
gzip file, since the gzip trailer ends with that four-byte length.

So why do I say only "most likely"? There are several reasons that
the last four bytes might not be the uncompressed length:

1. The uncompressed data might be >= 4 GB (this is permitted by the
gzip format, in which case the last four bytes is the length modulo
2^32).

2. The file might be a concatenation of several gzip files, so you'll
only get the length of the last one. (The gzip utility automatically
decompresses all of them.)

3. There might be junk after the end of the gzip stream in the file.

So if you really need to know the uncompressed length, you need to
decompress it all.

Mark

Phil Carmody

2007-07-31, 3:56 am

Joe <divya_krs@yahoo.com> writes:
> On Jul 31, 11:35 am, Hans-Peter Diettrich <DrDiettri...@aol.com>
> wrote:
>
> gzs returns the resulting offset location in bytes from the
> beginning of the uncompressed stream. Even if I get to the end of the
> uncompressed stream, I'm fine with that.
>
> Reading until the end of the stream using gzread would be extremely
> slow. So I think there should be a better way.


What do you want to do in the case where the single .gz file contains
several concatenated gzipped streams?

Phil
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
Hans-Peter Diettrich

2007-07-31, 6:56 pm

Joe wrote:

> gzs returns the resulting offset location in bytes from the
> beginning of the uncompressed stream. Even if I get to the end of the
> uncompressed stream, I'm fine with that.


Okay, that saves a few lines of code ;-)

> Reading until the end of the stream using gzread would be extremely
> slow. So I think there should be a better way.


You can try to gzs to a sufficiently high position. Perhaps it works?

You might find more about possible speedups in the gzs() code itself...

DoDi
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com