For Programmers: Free Programming Magazines  


Home > Archive > Compression > December 2006 > Rendundant information in ZIP file format









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Rendundant information in ZIP file format
Figolu

2006-11-29, 7:55 am

Hi all,

I wonder why there are redondant information about files present into
the zip.

For example, the 'file name' appears into the local file header, and
also into the central directory.

Can you explain me why ?

Thanks,
Tom

cr88192

2006-11-29, 7:55 am


"Figolu" <bucher@virtools.com> wrote in message
news:1164798647.746625.184470@j72g2000cwa.googlegroups.com...
> Hi all,
>
> I wonder why there are redondant information about files present into
> the zip.
>
> For example, the 'file name' appears into the local file header, and
> also into the central directory.
>
> Can you explain me why ?
>


not sure why it was done this way originally, but I can come up with a few
reasons as to why it makes sense now:

having local headers allow one to more flexibly work with free space, and
helps recover in the case of errors, and also to rebuild the central
directory if it is lost;

having the central directory allows much quicker access, as one doesn't have
to process the whole file as often.


or, at least for now, this is what I can guess...


> Thanks,
> Tom
>



Figolu

2006-11-29, 7:55 am

Well,
it will make sense to me if the Central Directory was at the start of
the file, thus we would not have to parse all the file to get the
table of content.

In the case of a non sable access to the file, I have to read all
the
data before coming to the Central Dir, thus I already know all what I
need, and thus it is not usefull for me.

Moreover, the specification doesn't say what to do if the redundant
information are not the same (local file name different from central
file name for example), it there some information about handling this
issue (if it appears ?)

Tom

cr88192 a =E9crit :

> "Figolu" <bucher@virtools.com> wrote in message
> news:1164798647.746625.184470@j72g2000cwa.googlegroups.com...
>
> not sure why it was done this way originally, but I can come up with a few
> reasons as to why it makes sense now:
>
> having local headers allow one to more flexibly work with free space, and
> helps recover in the case of errors, and also to rebuild the central
> directory if it is lost;
>
> having the central directory allows much quicker access, as one doesn't h=

ave[color=darkred]
> to process the whole file as often.
>
>
> or, at least for now, this is what I can guess...
>=20
>=20

cr88192

2006-11-29, 9:55 pm

not sure why some posts do this, too lazy to adjust though...

arguably, it would make sense if the central diretory, or at least its
offset, were available at the start of the file.

however, the usual way of locating the central directory is to start at the
end of the file, and scan backwards a short distance, looking for the end of
central directory marker. at this point we know where it is located.


yes, reading forwards is another common (and simple) way in which to access
a zip file, however, note that one is technically allowed to have garbage
between the headers, to a processor working this way may need to be able to
handle this garbage (scanning forwards for a header, in some cases).

in this particular case, the CDIR is essentially useless.


typically, it is assumed that the information matches, but this is not
always the case:
some information, such as the size and CRC, may optionally be located after
the compressed files' contents, and is thus only convinient to access in the
central directory;
in some cases, eg, when using central directory encryption, the contents of
the local headers are essentially garbage;
....


and so, this redundancy, though not always useful (some may not need the
central directory, and others may not need the local headers), does
contribute to the general flexibility and utility of the format (for
example, with a little processing and tweaky algos, zip is a usably
effective format, eg, for a read/write virtual filesystem, even though this
goes outside the scope of its design intent).

and such.



---

"Figolu" <bucher@virtools.com> wrote in message
news:1164806665.338495.138390@16g2000cwy.googlegroups.com...
Well,
it will make sense to me if the Central Directory was at the start of
the file, thus we would not have to parse all the file to get the
table of content.

In the case of a non sable access to the file, I have to read all
the
data before coming to the Central Dir, thus I already know all what I
need, and thus it is not usefull for me.

Moreover, the specification doesn't say what to do if the redundant
information are not the same (local file name different from central
file name for example), it there some information about handling this
issue (if it appears ?)

Tom



Ben Rudiak-Gould

2006-11-30, 3:55 am

Figolu wrote:
> it will make sense to me if the Central Directory was at the start of
> the file, thus we would not have to parse all the file to get the
> table of content.
>
> In the case of a non sable access to the file, I have to read all
> the
> data before coming to the Central Dir, thus I already know all what I
> need, and thus it is not usefull for me.


The central directory is at the end because otherwise streaming compression
would be impossible -- you'd have to buffer all of the compressed data
before starting to write the output. Pretty much every compressed format is
designed with this in mind.

Usually when one is indexing zip files one has random access to them, at
least in my experience. And when one is streaming zip files one usually
wants to unpack them, not just index them, at the end of the pipeline. So I
think it's a good design.

-- Ben
Anthony Naggs

2006-12-11, 6:57 pm

After much consideration Figolu decided to share these wise words:
>Hi all,
>
>I wonder why there are redondant information about files present into
>the zip.
>
>For example, the 'file name' appears into the local file header, and
>also into the central directory.
>
>Can you explain me why ?


Two reasons come to mind.

Firstly a zip file my be added to the end of another file. Typically a
..exe file that decompresses the data on a particular platform. You do
not know how far from the start you must search for the .zip file
header. But on a random access file system you know the zip file
comment is limited to 64kB, so you only have to look that far from the
end.

Secondly when Pkzip first appeared the main problem it was helping to
solve was backing up a hard drive to set of floppy disks. When
uncompressing the whole zip archive may not be present, so the tools
work backwards through the set of floppies to find the start of the
central directory. It then knows which disk number to ask for to
extract each file in the archive.


ttfn,
tony
--
BAD COMPUTER! That's my registry file you've trashed.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com