For Programmers: Free Programming Magazines  


Home > Archive > Compression > November 2005 > File size limitations









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author File size limitations
Kevin Brault

2005-11-13, 3:55 am

Hello everyone,

My Zip backup files size have grown to over 4 Gb in size and I believe this
is causing problems (Windows XP - NTFS).

Is there a file size limitation for zip files?

What other commonly used formats are there that do not have this limitation?


Thanks


Nir Halowani

2005-11-13, 3:55 am


Kevin Brault wrote:
> Hello everyone,
>
> My Zip backup files size have grown to over 4 Gb in size and I believe this
> is causing problems (Windows XP - NTFS).
>
> Is there a file size limitation for zip files?
>
> What other commonly used formats are there that do not have this limitation?
>
>
> Thanks


Hi,

The max file size possible is: 4294967295 bytes, 4.095 GB. This limit
is due to the zip file format. Allowing files bigger than 4 GB would
need more than 32 bit to save the file size information in the zip
file, resulting in a files which is not according to the zip file
specifications. The file size limitation apply to the resulting zip
file and any file to be archived in the zip file. When working with
large files you need to select a temporary directory on a disk with at
least the same amount of free space as the compressed file size of the
biggest file, if the option split large files are selected you need at
least the amount of free space as the compressed files size of the
biggest file plus the amount of the max zip size selected. You might
also consider Incremental backups which include only the files that
have been modified since the last backup, and normally the archive bit
are cleared or Differential backups which include the files that have
been changed since the last full backup or incremental backup, and it
does not clear the archive bit after the backup.

- Nir

cr88192

2005-11-13, 7:55 am


"Kevin Brault" <sharptech@cox.net> wrote in message
news:%%ydf.376$xz.184@fed1read05...
> Hello everyone,
>
> My Zip backup files size have grown to over 4 Gb in size and I believe
> this is causing problems (Windows XP - NTFS).
>

NTFS does not have a problem.
the traditional ZIP format does have a 4GB limit, however.

> Is there a file size limitation for zip files?
>

yes.
there is zip64, but it is not widespread.

> What other commonly used formats are there that do not have this
> limitation?
>

RAR.
(maybe) 7zip.

tar might work (I have not checked, I think tar has a 35 bit limit which
leads to a max individual file size of 32GB), dunno about for the whole
archive (a lot of this depends on the 'tar' tool itself).

tar approaches things in a much more stream-like manner, vs zip, which
actually cares where stuff is at. for the file format itself, there is no
inherent upper limit.

gzip might have a problem, given gzip uses 32 bit length fields and such.
maybe it will be fine if one attempts to compress/decompress using pipes
though...

as for bzip2, dunno, maybe no problem, maybe similar to gzip.

a >4GB gz or bz2 file would be scary though (no real error recovery, so any
corruption would destroy the whole file).

maybe things could be broken into smaller chunks?...


dunno if any of this helps.

>
> Thanks
>



Mark Adler

2005-11-13, 6:55 pm

cr88192 wrote:
> tar might work


GNU tar as no file size limit. Even path names have no length limit!
Some older versions of tar limit the size of individual files to 8 GB,
but you probably don't have one of those.

> gzip might have a problem, given gzip uses 32 bit length fields and such.


No, gzip has no problem with files of any size. It uses the length
modulo 2^32 as one of two integrity checks, which works fine with any
size file.

> as for bzip2, dunno, maybe no problem, maybe similar to gzip.


bzip2 is fine with arbitrarily long files as well.

> a >4GB gz or bz2 file would be scary though (no real error recovery, so any
> corruption would destroy the whole file).


bzip2 does have good error recovery for large files. For example if
there is one error in the compressed stream, you should lose only about
900 KB of uncompressed data around it.

gzip has no error recovery, and so if you use gzip (which may be
preferable for speed reasons on very large data sets), then it would be
wise to break up the tar file into pieces and compress them separately.
Breaking it up into multi-megabyte chunks will result in essentially
no degradation in compression. The individual gzip files can simply be
concatenated into a single large output file (in the correct order of
course), and gunzip will automatically decompress them all
transparently. This could be done easily with a simple program that
uses zlib.

mark

cr88192

2005-11-15, 7:55 am


"Mark Adler" <madler@alumni.caltech.edu> wrote in message
news:1131927735.348944.153430@g49g2000cwa.googlegroups.com...
> cr88192 wrote:
>
> GNU tar as no file size limit. Even path names have no length limit!
> Some older versions of tar limit the size of individual files to 8 GB,
> but you probably don't have one of those.
>

ok.

probably the format varies or such for long path names though (given from
what I remember 100 chars were provided for the path normally).

>
> No, gzip has no problem with files of any size. It uses the length
> modulo 2^32 as one of two integrity checks, which works fine with any
> size file.
>

oh, ok, it is mod 2^32, this makes sense...

just me being more used to algos where the length matters a little more (eg:
my arithmatic coder needs to know the decoded length to decode
correctly...).

>
> bzip2 is fine with arbitrarily long files as well.
>

.

>
> bzip2 does have good error recovery for large files. For example if
> there is one error in the compressed stream, you should lose only about
> 900 KB of uncompressed data around it.
>

ok, , I didn't know that...

> gzip has no error recovery, and so if you use gzip (which may be
> preferable for speed reasons on very large data sets), then it would be
> wise to break up the tar file into pieces and compress them separately.
> Breaking it up into multi-megabyte chunks will result in essentially
> no degradation in compression. The individual gzip files can simply be
> concatenated into a single large output file (in the correct order of
> course), and gunzip will automatically decompress them all
> transparently. This could be done easily with a simple program that
> uses zlib.
>

.


I have personally not examined the tools themselves in much detail, but I
guess I know at least something about the formats...

> mark
>



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com