For Programmers: Free Programming Magazines  


Home > Archive > Compression > September 2006 > lame (Re: ZPACK-2: Read/Write ZIP lib)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author lame (Re: ZPACK-2: Read/Write ZIP lib)
cr88192

2006-09-30, 6:55 pm


"cr88192" <cr88192@NOSPAM.hotmail.com> wrote in message
news:6150c$451dd1b2$ca83a8d6$12673@saipa
n.com...
> well, I have gotten the lib written and working well enough.
> however, I am feeling too lazy to put it online right now unless anyone is
> interested.
>
> if anyone wants to look at the source or mess with the lib or anything,
> they can email me here: cr88192@hormail.com and I can respond with the
> source as an attachment or something...
>

well, at first I figured no one gave a crap (and maybe they don't), now I
notice, I failed at typing my email addr correctly...
cr88192@hotmail.com ...

divantage really of lacking much time...


so, here is a maybe more relevant question:

what does the lib do?
well, it accesses read/writes zipfiles.

beyond that: it can do both at the same time, and is smart enough not to
expand the archive if it can find space inside the archive somewhere.

it is designed such that memory use and performance should be reasonable.


what is it intended for?

mostly for read/write filesystem like uses.
the fact that it is a zip file should remain at least vaguely hidden.
the api resembles an abstract interface (much of the time), as such the file
interface vaguely resembles that of stdio (however, a more stdio-like
interface is possible).

note that it uses contexts, and can have open multiple zipfiles.
since I my main projects included their own VFS system, and I would work by
using the lib with this system, I didn't implement such a system within the
lib itself.

if needed, such an interface could be added without too much effort.


what are some limitations?

this is likely to work poorly as a traditional archiver.
since caches are kept in memory, and files are compressed in buffers, this
is not well suited for larger files (the main emphasis is placed on a larger
number smaller files).

attempting to resolve the above is likely involve breaking compatibility
with existing zip tools (mostly by adding a 'fragmentation') mechanism.

though using 'patching' could be possible in this case, it is both
suboptimal and patented.

eventually, it may make sense to support zip64 extensions, but for now I
will assume that the archives involved are likely to be somewhat smaller.


as noted previously, there is an issue where the first 4 bytes needs to hold
a special status. likewise, byte offset 0 was previously used as a 'nothing'
offset. since in the zpack format, a header was here, I assumed that
anything pointing to the file header was 'obviously wrong'. since zip can
have a file header there, it may be needed to figue out an alternative
(which is made problematic given my inconsistent treatment of file offset
types...).

actually, I could probably make all the offsets be represented as signed 64
bit values, allowing me to use -1. this also meshes well with some parts of
the code using signed ints instead.


what about fragmentation/larger files?

this could probably be a considered/optional feature (since it would make
the files incompatible with other zip tools).

probably this could be done with a special method number (or a custom extra
header).

eg:
0x505A: or whatever...

content holds info about the spans:
{
FOURCC fcc; //'frag'
u32 _pad;
u64 vsize; //virtual file size
u16 _pad1[7];
u16 cnt;

{
u64 voffs; //virtual offset
u64 doffs; //data offset (in archive)
u32 csize;
u32 usize;
u32 crc32;
u16 method;
u16 flag;
}span[cnt];
}

probably each fragment span would lack a custom header, instead being raw
compressed data.

maybe, I could break with zip conventions and store spans info in a more
space compact manner:

VLI2
0xxxxxxx
10xxxxxx xxxxxxxx
110xxxxx xxxxxxxx xxxxxxxx
1110xxxx xxxxxxxx xxxxxxxx xxxxxxxx
11110xxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
111110xx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
1111110x xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
11111110 xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx

{
FOURCC fcc; //'frag'
VLI2 vsize; //virtual file size
VLI2 cnt;

{
VLI2 voffs; //virtual offset
VLI2 doffs; //data offset (in archive)
VLI2 csize;
VLI2 usize;
u32 crc32;
byte method;
byte flag;
}span[cnt];
}

presumably any inter-span parts of the file space would be filled with
zeroes.
one could probably expect that the spans be sorted and non-overlapping.

a implementation could probably limit the max span length, eg, to something
like 64kB (should be a good limit for efficient buffering, but a little lame
for particularly large files, so for bigger files slightly larger spans may
make more sense).

or such...


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com