For Programmers: Free Programming Magazines  


Home > Archive > Compression > May 2005 > ZLIB - index a z_stream for later use









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author ZLIB - index a z_stream for later use
cchgroupmail@gmail.com

2005-05-23, 3:55 pm

I have a binary file with N individually compressed segments. Each
segment has some Info and D Data segments. I read the Info sections of
the file. Later I want to go back and read the D data segments. I
keep an index to the beginning of the data section in the file so i can
get back there. I was hoping to just copy the z_stream structure and
save it for each of the D data segments, but I get Z_DATA_ERROR. I'm
thinking it has to do with the state field of z_stream not really being
copied since it's a pointer Any ideas on how to do this?

Chris

John Reiser

2005-05-23, 3:55 pm

cchgroupmail@gmail.com wrote:
> ... I was hoping to just copy the z_stream structure and
> save it for each of the D data segments, but I get Z_DATA_ERROR. I'm
> thinking it has to do with the state field of z_stream not really being
> copied since it's a pointer Any ideas on how to do this?


RTFM, particularly the comments in zlib.h about Z_SYNC_FLUSH, Z_FULL_FLUSH,
and Z_FINISH as argument to deflate() .

--
cchgroupmail@gmail.com

2005-05-23, 3:55 pm

I read the FM. Actually I found a funciton in the header file ( I
didn't see in the manual ) inflateCopy which says it copies the
z_stream entirely to a dest ptr. This is exactly what I was looking
for. But thanks for your kind suggestion in reading the manual.

Mark Adler

2005-05-24, 3:55 am

cchgroupmail@gmail.com wrote:
> Actually I found a funciton in the header file ( I
> didn't see in the manual ) inflateCopy which says it copies the
> z_stream entirely to a dest ptr. This is exactly what I was looking
> for.


That may have been what you were looking for, but it may not be what
you want. Every copy of the state takes more than 32K bytes of memory.
It would be better to occasionally (say, once every few MB) use
Z_SYNC_FLUSH and mark the position, and you can then start raw
inflation from that point later without having to have an inflate state
for it.

mark

cchgroupmail@gmail.com

2005-05-24, 3:55 pm

Well, the problem (as i percieve it) is that the length of the
compressed data is of arbitrary length (These are actually compressed
variables in a Matlab MAT file). I am always using inflate with
Z_SYNC_FLUSH. I had tried just marking the file position where the
variable's data starts, and then restarting inflation from that point
later, but kept getting Data errors.

I can handle the normal compressed variables when there's only a single
data position I need to start inflation from again, because I can just
leave the compression stream hanging there (because each variable in
the file i create a z_stream structure for) until I either return to
it, or just end it. But in the struct-type matlab variables, each
field of the structure has it's own data header followed by data and
then it repeats. So What I do is read the header for the structure,
then the first variable's header stopping at the data copy the state
and save it with that field (as a struct-mat variable's data to my code
is just an array of more variables), then skip over the data, read the
next header,etc.

I'm certainly new to compression, so any other helpful ideas would be
appreciated. Not that I expect anyone to look at the file format, but
If you were interested or just wanted to visually see how these files
are set up, the pdf documentation of the MAT file-format is at
http://www.mathworks.com/access/hel...file_format.pdf

Any help/advice is greatly appreciated

Mark Adler

2005-05-24, 3:55 pm

cchgroupmail@gmail.com wrote:
> I am always using inflate with
> Z_SYNC_FLUSH. I had tried just marking the file position where the
> variable's data starts, and then restarting inflation from that point
> later, but kept getting Data errors.


First off, if you're not using raw inflate, you need to. See
inflateInit2(). Second, you may not be getting all the compressed data
out after the Z_SYNC_FLUSH, and so not marking the position correctly.
(By the way, you don't want to "always" use Z_SYNC_FLUSH -- you want to
use it only when marking a breakpoint. If you use it too often,
compression will be significantly degraded.)

Third, from the description of your application, you may not need to
bother with any of this. It sounds like you could simply compress each
variable individually, ending with Z_FINISH, and starting a new stream
for the next variable.

mark

cchgroupmail@gmail.com

2005-05-24, 3:55 pm

I don't want to uncompress the entire variable at once, otherwise i'd
read inflate the first 8 bytes to find out the uncompressed size of the
whole variable, then go back and just call uncompress of the entire
compressed variable (what i used to do). I will look into using
inflateInit2 instead. I'm using Z_SYNC_FLUSH b/c I want to read in
small bits of the enitre compression at a time. I need to read the
enitre variable header (and optionally sub-headers if it's a strucuture
or cell-array or object) but stop before the data, but don't know how
big the header is b/c it depends on what's in there. Below is a code
snip of what i'm doing. If you want the complete source let me know.

matvar->name = NULL;
matvar->data = NULL;
matvar->dims = NULL;
matvar->nbytes = 0;
matvar->data_type = 0;
matvar->class_type = 0;
matvar->data_size = 0;
matvar->mem_conserve = 0;
matvar->compression = 1;
matvar->fpos = fpos;

matvar->z.zalloc = NULL;
matvar->z.zfree = NULL;
matvar->z.opaque = NULL;

/* Matlab uses magic number 56??? */
if (nBytes < 56)
nbytes = nBytes;
else
nbytes=56;

bytesread += fread(comp_buf,1,nbytes,mat->fp);
matvar->z.next_in = comp_buf;
matvar->z.next_out = uncomp_buf;
matvar->z.avail_in = nbytes;
matvar->z.avail_out = 8; /* First uncompress type */
err = inflateInit(&(matvar->z));
if ( err != Z_OK ) {
Scats_Critical("inflateInit returned %d",err);
Scats_MatVarFree(matvar);
break;
}

/* Read Variable tag */
ptr = uncomp_buf;
bytesread += InflateVarTag(mat,matvar,uncomp_buf);
matvar->class_type = *(int*)ptr;
ptr += 4;
nbytes = *(int*)ptr;
if ( matvar->class_type != miMATRIX ) {
Scats_Critical("Uncompressed type not miMATRIX");
for ( i = 0; i < matvar->z.avail_out; i++ )
fprintf(stderr,"%02x ",(int)*(uint8_t *)ptr);
fprintf(stderr,"\n");
fs(mat->fp,nBytes-bytesread,SEEK_CUR);
Scats_MatVarFree(matvar);
matvar = NULL;
break;
}
/* Inflate Array Flags */
ptr = uncomp_buf;
bytesread += InflateArrayFlags(mat,matvar,uncomp_buf)
;
/* Array Flags */
if ( *(int *)ptr == miUINT32 ) {
ptr += 8;
array_flags = *(uint32_t*)ptr;
if ( mat->byteswap )
array_flags = int32Swap((int32_t*)&array_flags);
matvar->class_type = (int)(array_flags & miCLASS_T);
matvar->isComplex = (int)(array_flags & miCOMPLEX);
matvar->isGlobal = (int)(array_flags & miGLOBAL);
matvar->isLogical = (int)(array_flags & miLOGICAL);
}
ptr = uncomp_buf;
/* Inflate Dimensions */
bytesread += InflateDimensions(mat,matvar,uncomp_buf)
;
/* Rank and Dimension */
if ( *(int *)ptr == miINT32 ) {
ptr += 4;
nbytes = (int)*(int32_t*)ptr;
ptr += 4;
matvar->rank = nbytes / 4;
matvar->dims = (int *)malloc(matvar->rank*sizeof(int));
for ( i = 0; i < matvar->rank; i++ ) {
int32_t dim;
dim = ((int32_t*)ptr)[0];
matvar->dims[i] = (int)dim;
ptr += 4;
}
if ( matvar->rank % 2 != 0 )
ptr += 4;
}
/* Inflate variable name tag */
ptr = uncomp_buf;
bytesread += InflateVarNameTag(mat,matvar,uncomp_buf)
;
/* Name of variable */
if ( *(int*)ptr == miINT8 ) { /* Name not in tag */
int len;
ptr += 4;
len = *(int*)ptr;

if ( len % 8 == 0 )
i = len;
else
i = len+(8-(len % 8));
matvar->name = (char *)malloc(i+1);
/* Inflate variable name */
bytesread += InflateVarName(mat,matvar,matvar->name,i);
matvar->name[len] = '\0';
} else if ( *(int16_t*)ptr == miINT8 &&
*(int16_t*)(ptr+2) | 0x00 ) { /* Name in tag */
int len;
len = (int)*(int16_t*)(ptr+2);
ptr+=4;
matvar->name = (char *)malloc(len+1);
memcpy(matvar->name,ptr,len);
matvar->name[len] = '\0';
}



/*
*-------------------------------------------------------------------
* ZLIB Decompression (Inflate) Routines
*-------------------------------------------------------------------
*/

/*
* Inflate the data until nbytes of uncompressed data has been inflated
*/
static int
InflateSkip(scats_mat_t *mat, SCATS_MATVAR *matvar, int nbytes)
{
uint8_t comp_buf[32],uncomp_buf[32];
int bytesread = 0, err, cnt = 0;

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = 1;
matvar->z.next_out = uncomp_buf;
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateSkip: inflate returned %d",err);
return bytesread;
}
if ( !matvar->z.avail_out ) {
matvar->z.avail_out = 1;
matvar->z.next_out = uncomp_buf;
cnt++;
}
while ( cnt < nbytes ) {
if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateSkip: inflate returned %d",err);
return bytesread;
}
if ( !matvar->z.avail_out ) {
matvar->z.avail_out = 1;
matvar->z.next_out = uncomp_buf;
cnt++;
}
}

return bytesread;
}
/*
* Inflates the variable's tag. buf must hold at least 8 bytes
*/
static int
InflateVarTag(scats_mat_t *mat, SCATS_MATVAR *matvar, void *buf)
{
uint8_t comp_buf[32];
int bytesread = 0, err;

assert(buf != NULL);

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = 8;
matvar->z.next_out = buf;
err = inflate(&(matvar->z),Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateVarTag: inflate returned %d",err);
return bytesread;
}
while ( matvar->z.avail_out && !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateVarTag: inflate returned %d",err);
return bytesread;
}
}

return bytesread;
}





/*
* Inflates the Array Flags Tag and the Array Flags data. buf must
hold at
* least 16 bytes
*/
static int
InflateArrayFlags(scats_mat_t *mat, SCATS_MATVAR *matvar, void *buf)
{
uint8_t comp_buf[32];
int bytesread = 0, err;

assert(buf != NULL);

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = 16;
matvar->z.next_out = buf;
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateArrayFlags: inflate returned %d",err);
return bytesread;
}
while ( matvar->z.avail_out && !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateArrayFlags: inflate returned
%d",err);
return bytesread;
}
}

return bytesread;
}

/*
* Inflates the Dimensions Tag and the Dimensions data. buf must hold
at
* least (8+4*rank) bytes
*/
static int
InflateDimensions(scats_mat_t *mat, SCATS_MATVAR *matvar, void *buf)
{
uint8_t comp_buf[32];
int bytesread = 0, err, rank, i;

assert(buf != NULL);

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = 8;
matvar->z.next_out = buf;
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateDimensions: inflate returned %d",err);
return bytesread;
}
while ( matvar->z.avail_out && !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateDimensions: inflate returned
%d",err);
return bytesread;
}
}
if ( *(int *)buf != miINT32 ) {
Scats_Critical("Reading dimensions expected type miINT32");
return bytesread;
}
rank = ((int *)buf)[1];
if ( rank % 8 != 0 )
i = 8-(rank %8);
else
i = 0;
rank+=i;

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = rank;
matvar->z.next_out = buf+8;
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateDimensions: inflate returned %d",err);
return bytesread;
}
while ( matvar->z.avail_out && !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateDimensions: inflate returned
%d",err);
return bytesread;
}
}

return bytesread;
}
static int
InflateVarNameTag(scats_mat_t *mat, SCATS_MATVAR *matvar, void *buf)
{
uint8_t comp_buf[32];
int bytesread = 0, err;

assert(buf != NULL);

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = 8;
matvar->z.next_out = buf;
err = inflate(&(matvar->z),Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateVarNameTag: inflate returned %d",err);
return bytesread;
}
while ( matvar->z.avail_out && !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateVarNameTag: inflate returned
%d",err);
return bytesread;
}
}

return bytesread;
}

static int
InflateVarName(scats_mat_t *mat, SCATS_MATVAR *matvar, void *buf, int
N)
{
uint8_t comp_buf[32];
int bytesread = 0, err;

assert(buf != NULL);

if ( !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
}
matvar->z.avail_out = N;
matvar->z.next_out = buf;
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateVarName: inflate returned %d",err);
return bytesread;
}
while ( matvar->z.avail_out && !matvar->z.avail_in ) {
matvar->z.avail_in = 1;
matvar->z.next_in = comp_buf;
bytesread += fread(comp_buf,1,1,mat->fp);
err = inflate(&matvar->z,Z_SYNC_FLUSH);
if ( err != Z_OK ) {
Scats_Critical("InflateVarName: inflate returned %d",err);
return bytesread;
}
}

return bytesread;
}

Mark Adler

2005-05-24, 3:55 pm

cchgroupmail@gmail.com wrote:
> I don't want to uncompress the entire variable at once,

....
> I'm using Z_SYNC_FLUSH b/c I want to read in
> small bits of the enitre compression at a time.


You have two options, which are not all that different. For either,
decide on an acceptable chunk size of your variables that balances
random access speed with compression effectiveness. If you break it up
into too many independent chunks, compression will suffer. Too few
chunks, and it will take longer to get to the point you need to, since
you always have to start decompressing at the beginning of a chunk. I
have found that around 1 MB of uncompressed data is a good chunk size,
and only degrades compression by about 1%. However that is highly data
dependent, and so you should experiment with your data.

Given the chunk size, you compress in one of two ways (I assume that
you are the person doing the compression here). Either compress each
chunk individually, ending each with a deflate(strm, Z_FINISH), and
save the start of each stream. Or, requiring a little more care,
create a single deflate stream, but end each chunk with deflate(strm,
Z_SYNC_FLUSH), and mark the start of the next chunk (one byte after the
last byte emitted from the flush). To decompress the first option,
simple use inflate normally. For the second option, use raw inflate
(see inflateInit2()) to start decompressing after the flush.

The advantage of the first method is simplicity, and you get an
integrity check for each chunk. The advantage of the second method is
that you don't have the overhead of a header and trailer (which is
insignificant for large chunks), and the raw inflate a little bit
faster since it's not calculating a check value. Also you can more
easily cross chunk boundaries if the requested output requires that.

I recommend the first method for simplicity. In fact, even if you want
some advantage of the second method, you should probably get the first
method working first, and then modify it for the second method.

For either method, you will need to find a place in your data to save
those marks where you can start decompression from. The marks should
contain both where you can start decompressing from in the compressed
data, and what offset in the uncompressed data that decompressed data
begins. For random access, you simply find the largest uncompressed
offset less than or equal to where you want data from, and start
decompressing from there until you get what you need.

> err = inflate(&matvar->z,Z_SYNC_FLUSH);


You're missing the point of Z_SYNC_FLUSH. It's purpose is to put
restart points in the compressed stream, and so only makes sense when
used with deflate(). The flush parameter of inflate() has, to first
order, no effect on the operation of inflate().

mark

cchgroupmail@gmail.com

2005-05-24, 3:55 pm

I am not the one compressing data. It's being compressed by Matlab.
By default in Matlab versions 7+, the Matlab file type (.mat) uses
compression. Each variable in the file is compressed individually.
Again, for more detailed information, it's on the mathworks website
http://www.mathworks.com/access/hel...file_format.pdf

I am reading their files in my C-code and uncompressing it. I don't
know how/where they put flush points. Also, the header is not likely
to be too large, probably less than 100 bytes except for structures,
etc. I used Z_SYNC_FLUSH b/c according to the manual the flush
parameter for inflate is undefined for all values except Z_SYNC_FLUSH
and Z_FINISH and Z_FINISH says it's really for single calls to inflate.
If Matlab does not put a sync point at the beginning of the data ( i
guess if they just run compress or deflate once on the whole buffer)
then you couldn't start compression from there. This is why i was
thinking of using inflateCopy. Using that though for the structures, i
get a sig11:

Program received signal SIGSEGV, Segmentation fault.
0x080572e8 in inflate (strm=0x9df263c, flush=2) at inflate.c:898
898 this = state->lencode[BITS(state->lenbits)];

I'm guessing that BITS(state->lenbits) is exceeding the bounds of
lencode maybe b/c the accumalator has changed?

Mark Adler

2005-05-24, 8:55 pm

cchgroupmail@gmail.com wrote:
> I am not the one compressing data. It's being compressed by Matlab.

....
> I don't know how/where they put flush points.


Almost certainly they do not insert any flush points. In that case, in
order to decompress data from the middle of the variable, you must
start decompressing at the beginning of the variable. There is no way
out of that. The only hope you have is that having done that once, and
you want to go back and get some data before that middle point, then
you can save time by having used inflateCopy() or some other approach
to save some states along the way on that first pass, in order to be
able restart somewhere closer to your desired access point on the
second pass, instead of starting back from the beginning again. You
would need to select the frequency of inflateCopy()'s to balance the
speed of random access against the memory requirements of the copies
(more than 32K bytes each).

Alternatively, you could cache the decompressed data and access it
directly. This whole discussion goes away if the memory to save the
decompressed variable is not prohibitive.

Or you could reprocess the file, recompressing the variables yourself
with flush points. That may be a win, depending on how many times you
need to access parts of variables, and if the variables are very large.

> I used Z_SYNC_FLUSH b/c according to the manual the flush
> parameter for inflate is undefined for all values except Z_SYNC_FLUSH
> and Z_FINISH and Z_FINISH says it's really for single calls to

inflate.

It does? Here's what it says in zlib.h (apologies in advance if google
messes up the line breaks):

The flush parameter of inflate() can be Z_NO_FLUSH, Z_SYNC_FLUSH,
Z_FINISH, or Z_BLOCK. Z_SYNC_FLUSH requests that inflate() flush as
much
output as possible to the output buffer. Z_BLOCK requests that
inflate() stop
if and when it gets to the next deflate block boundary. When decoding
the
zlib or gzip format, this will cause inflate() to return immediately
after
the header and before the first block. When doing a raw inflate,
inflate()
will go ahead and process the first block, and will return when it
gets to
the end of that block, or when it runs out of data.
...
The use of Z_FINISH
is never required, but can be used to inform inflate that a faster
approach
may be used for the single inflate() call.

The bottom line is that if you're not trying to do a single inflate
call and not trying to decompress a block at a time, then you can use
Z_NO_FLUSH or Z_SYNC_FLUSH. As it turns out, they don't behave any
differently -- inflate always generates as much output as it can with
the provided input.

> This is why i was
> thinking of using inflateCopy. Using that though for the structures,

i
> get a sig11:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x080572e8 in inflate (strm=0x9df263c, flush=2) at inflate.c:898
> 898 this = state->lencode[BITS(state->lenbits)];
>
> I'm guessing that BITS(state->lenbits) is exceeding the bounds of
> lencode maybe b/c the accumalator has changed?


The only way to get this error is if the strm you provided to inflate
is invalid or corrupted, and the state pointer in that structure is
therefore pointing off into la la land.

By the way, and this should be in the documentation: each copy made by
inflateCopy() needs to be freed by inflateEnd(). If you don't, you'll
end up with a massive memory leak.

mark

cchgroupmail@gmail.com

2005-05-24, 8:55 pm

Now we're getting back to my original "attempted" solution of using
inflateCopy. I do indeed decompress the data and I don't need periodic
inflateCopy's b/c as I'm decompressing, I know where it is i want to
resume again. I probably could just uncompress the entire thing at
once, but There are a lot of times I want just the header and not the
data. So my routines are split. From running through the debugger,
the memory at lencode and distcode is not available in the inflateCopy
version, but are in the original. And yes, i free all inflateCopy
versions too

Maybe you can tell something from the structures, so here's the
z_stream structure

(gdb) p *strm
$3 = {next_in = 0xfeed8d60 "\234", avail_in = 1, total_in = 58,
next_out = 0xfeed8db8 "", avail_out = 8, total_out = 136, msg = 0x0,
state = 0x8ac66a0, zalloc = 0x8055d9f <zcalloc>, zfree = 0x8055dba
<zcfree>,
opaque = 0x0, data_type = 64, adler = 632555310, reserved = 0}

And the state
(gdb) p *state
$5 = {mode = LEN, last = 1, wrap = 1, havedict = 0, flags = 0,
check = 632555310, total = 136, wbits = 15, wsize = 32768, whave =
136,
write = 136, window = 0x8ac8250 "\016", hold = 0, bits = 0, length =
0,
offset = 1, extra = 0, lencode = 0x8064758, distcode = 0x8064f58,
lenbits = 9, distbits = 5, ncode = 145476096, nlen = 32, ndist = 24,
have = 0, next = 0x8ac6bc8, lens = {27502, 8250, 50, 0, 0, 0, 54665,
0 <repeats 313 times>}, work = {0 <repeats 288 times>}, codes = {{
op = 0 '\0', bits = 0 '\0', val = 0} <repeats 1440 times>}}
(gdb) p *state->window
$18 = 14 '\016'
(gdb) p *state->lencode
Cannot access memory at address 0x8064758
(gdb) p *state->distcode
Cannot access memory at address 0x8064f58
(gdb) p *state->next
$19 = {op = 0 '\0', bits = 0 '\0', val = 0}


I'm not sure where to go from here. I really don't want to uncompress
the entire thing at once, but this method doesn't seem to be working
for me.
By the way, I greatly appreciate your time and involvement in this
post.

Where I got that the behavior was undefined is on the website
http://www.gzip.org/zlib/manual.html#inflate

Mark Adler

2005-05-25, 3:56 am

cchgroupmail@gmail.com wrote:
> I don't need periodic
> inflateCopy's b/c as I'm decompressing, I know where it is i want to
> resume again. I probably could just uncompress the entire thing at
> once, but There are a lot of times I want just the header and not the
> data.


I still don't understand your application. If you just want to pause
inflating, that's what the inflateInit / inflate / inflateEnd interface
provides. You can just provide enough output space for the header, and
it will stop when it decompresses that. You can inflate as much as you
want, or as little as you want. You can go do something else, and then
come back and inflate some more. All using just the one state. Or you
can just stop where you are and free the state. Why would you need to
copy the state? inflateCopy() allows you to continue inflating with
the original state, and later go back to the same place and inflate
again from there using the copied state.

> offset = 1, extra = 0, lencode = 0x8064758, distcode = 0x8064f58,


Ah, hell. It's a bug in inflateCopy(). It will be fixed in the next
version. (Patch below.) Thanks for the report. By the way, I can
tell from the bug that you were inflating something rather small since
it was using fixed Huffman codes. There should be no reason to use
inflateCopy() on something that small (hence why the bug hasn't been
noticed until now), seeing as how decompressing the whole thing would
be faster than copying the state.

mark


*** inflate.c Sun Oct 3 19:33:51 2004
--- inflate-1.2.2p1.c Tue May 24 21:31:00 2005
***************
*** 1263,1270 ****
/* copy state */
*dest = *source;
*copy = *state;
! copy->lencode = copy->codes + (state->lencode - state->codes);
! copy->distcode = copy->codes + (state->distcode - state->codes);
copy->next = copy->codes + (state->next - state->codes);
if (window != Z_NULL)
zmemcpy(window, state->window, 1U << state->wbits);
--- 1263,1274 ----
/* copy state */
*dest = *source;
*copy = *state;
! if (state->lencode >= state->codes &&
! state->lencode <= state->codes + ENOUGH - 1)
! {
! copy->lencode = copy->codes + (state->lencode -
state->codes);
! copy->distcode = copy->codes + (state->distcode -
state->codes);
! }
copy->next = copy->codes + (state->next - state->codes);
if (window != Z_NULL)
zmemcpy(window, state->window, 1U << state->wbits);

cchgroupmail@gmail.com

2005-05-25, 8:55 am

Thank you Mark, Yeah i've been trying to debug inflate.c b/c I knew the
memory had not been corrupted. Yes, decompressing the entire thing at
once would be: faster, easier, etc, but... Matlab files can get pretty
large, so just b/c my test files are small does not imply actual files
used would be small. Also, the header is small, the data section not
so much. Some of the covariance matrices I have are a couple hundred
MB. I'd rather not inflate the entire thing. Now, the point I tried
to illustrate before was that with normal variables and ONE data
section, I had no problem b/c as you said I just left the inflation
state hanging and picked it back up again if i wanted to read the data.
The situation where I need inflateCopy is when the variable stored is
a structure with X number of fields. Each field is essentially it's
"own" matlab variable. So, in order to read the structure header and
the header for each of the X data segments without reading the data, I
need X reentry points into the compression stream so i can just s
back there and uncompress that specific variable. I don't know If i've
illustrated my intentions well enough, but thank youfor looking into
the inflateCopy issue. I'll patch the zlib source and try it again
today.

Chris

cchgroupmail@gmail.com

2005-05-25, 3:55 pm

Mark,
I can't thank you enough for your involvement in this. After
patching the source, my code runs great! I am able to recover all the
data including the structures. Sorry I was not more clear about my
intentions and reasons behind it. If you haven't yet read my previous
post to your patch, you may want to look at it so hopefully you'll get
a better idea of what i'm trying to do. Thank you again for your help.

Chris

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com