For Programmers: Free Programming Magazines  


Home > Archive > Compression > July 2006 > Re: Opps! I need some serious help.









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Opps! I need some serious help.
Mark Adler

2006-07-23, 6:55 pm

Brad,

First off, I'm not clear on how you were going to recover the second
recompressed .tar.gz file even if you had built it correctly. How were
you going to do that? gnutar, or least the gnutar I have, does not
process concatenated tar streams. It just ignores whatever comes after
the first one.

Second, why did you intend to recompress the already compressed .tar
files? They won't be compressed anymore the second time (unless they
are wildly redundant). And why concatenate them? Why aren't you
simply doing something like:

tar -czf bothofthem.tar.gz /var/www/cgi-bin /var/www/html

and not recompressing? By the way, given either what you intended or
what you in fact did, fubar.tar.gz is not a .tar.gz file. It is a
compression of a concatenation of two .tar.gz files, or actually a
compression of a concatenation of a .tar.gz file and a .tar file as you
implemented it.

Anyway, given the perplexing choices you have made, and the error in
implementing those choices, what you need to do to recover the two
imbedded files (which as far as I can tell you would also have to do if
you had not made the implementation error) is this:

Use zlib to write a small program to uncompress a gzip stream to one
output file, and then write the remaining data after the gzip file in
the input to a second output file. gunzip fubar.tar.gz to get
fubar.tar. Feed your fubar.tar file (which isn't a .tar file) to the
small program, and you will get the two output files. The first one
will be your cgibin.tar (uncompressed), and the second will be what you
called html.tar.gz, but should be named html.tar. Then you can extract
the contents of the .tar files as usual.

What saves you here is that cgibin.tar.gz is self-terminating, so it is
possible to know exactly where it ends, and where the next thing
starts. If you had implemented it how you intended, then as far as I
can tell with the available utilities, you would still need to write
the small program above and then the second file would be a real
html.tar.gz instead of an html.tar, and would be extracted as usual.

The manual approach suggested by jasen may work perfectly fine as well,
though it is not deterministic since a faux tar signature may appear by
chance in the preceding data.

mark

Phil Carmody

2006-07-23, 6:55 pm

"Mark Adler" <madler@alumni.caltech.edu> writes:
> Brad,
>
> First off, I'm not clear on how you were going to recover the second
> recompressed .tar.gz file even if you had built it correctly. How were
> you going to do that? gnutar, or least the gnutar I have, does not
> process concatenated tar streams. It just ignores whatever comes after
> the first one.
>
> Second, why did you intend to recompress the already compressed .tar
> files? They won't be compressed anymore the second time (unless they
> are wildly redundant). And why concatenate them? Why aren't you
> simply doing something like:
>
> tar -czf bothofthem.tar.gz /var/www/cgi-bin /var/www/html


Sounds like a throwback to the old pkzip days, that was common
behaviour back then.

Phil
--
The man who is always worrying about whether or not his soul would be
damned generally has a soul that isn't worth a damn.
-- Oliver Wendell Holmes, Sr. (1809-1894), American physician and writer
Paul Marquess

2006-07-23, 6:55 pm

Mark Adler wrote:

> Brad,
>
> First off, I'm not clear on how you were going to recover the second
> recompressed .tar.gz file even if you had built it correctly. How were
> you going to do that? gnutar, or least the gnutar I have, does not
> process concatenated tar streams. It just ignores whatever comes after
> the first one.
>
> Second, why did you intend to recompress the already compressed .tar
> files? They won't be compressed anymore the second time (unless they
> are wildly redundant). And why concatenate them? Why aren't you
> simply doing something like:
>
> tar -czf bothofthem.tar.gz /var/www/cgi-bin /var/www/html
>
> and not recompressing? By the way, given either what you intended or
> what you in fact did, fubar.tar.gz is not a .tar.gz file. It is a
> compression of a concatenation of two .tar.gz files, or actually a
> compression of a concatenation of a .tar.gz file and a .tar file as you
> implemented it.
>
> Anyway, given the perplexing choices you have made, and the error in
> implementing those choices, what you need to do to recover the two
> imbedded files (which as far as I can tell you would also have to do if
> you had not made the implementation error) is this:
>
> Use zlib to write a small program to uncompress a gzip stream to one
> output file, and then write the remaining data after the gzip file in
> the input to a second output file. gunzip fubar.tar.gz to get
> fubar.tar. Feed your fubar.tar file (which isn't a .tar file) to the
> small program, and you will get the two output files. The first one
> will be your cgibin.tar (uncompressed), and the second will be what you
> called html.tar.gz, but should be named html.tar. Then you can extract
> the contents of the .tar files as usual.
>
> What saves you here is that cgibin.tar.gz is self-terminating, so it is
> possible to know exactly where it ends, and where the next thing
> starts. If you had implemented it how you intended, then as far as I
> can tell with the available utilities, you would still need to write
> the small program above and then the second file would be a real
> html.tar.gz instead of an html.tar, and would be extracted as usual.
>
> The manual approach suggested by jasen may work perfectly fine as well,
> though it is not deterministic since a faux tar signature may appear by
> chance in the preceding data.
>
> mark


Here is a possible Perl-based solution that takes the fubar.tar.gz file as
an input parameter and creates two tar files from its contents. The output
file names are hard-wired in the script to be cgibin.tar and html.tar.


use IO::Uncompress::Gunzip qw(:all);
use strict;
use warnings;

die "Usage $0 inputfile\n"
unless @ARGV == 1 ;

my $input = $ARGV[0];

my $cgibin = "cgibin.tar";
my $html = "html.tar";

open my $in, "gunzip -c $input |"
or die "Cannot open $input: $!\n";

my $gunz = new IO::Uncompress::Gunzip $in
or die "Error Gunzipping: $GunzipError\n";

open my $out, ">$cgibin"
or die "Cannot open $cgibin: $!\n";

my $buffer;
while ($gunz->read($buffer) > 0)
{
print $out $buffer;
}
close $out;


open $out, ">$html"
or die "Cannot open $html: $!\n";

print $out $gunz->trailingData();

while(read($in, $buffer, 4096) > 0)
{
print $out $buffer;
}

jasen

2006-07-25, 7:55 am

On 2006-07-23, Mark Adler <madler@alumni.caltech.edu> wrote:
> Brad,
>
> First off, I'm not clear on how you were going to recover the second
> recompressed .tar.gz file even if you had built it correctly.


> How were
> you going to do that? gnutar, or least the gnutar I have, does not
> process concatenated tar streams. It just ignores whatever comes after
> the first one.


zcat double-tar | tar zxi

this has been available in gnu tar since atleast september 1993 (date at the
bottom of the man page), which matches my recollection.

> Second, why did you intend to recompress the already compressed .tar
> files?


I expect he noticed that it made it smaller :) but didn't notice why.

> tar -czf bothofthem.tar.gz /var/www/cgi-bin /var/www/html


That would probably be the best solution.

Bye.
Jasen
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com