For Programmers: Free Programming Magazines  


Home > Archive > Compression > August 2007 > Utility for compacted file with ".C" file extension?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Utility for compacted file with ".C" file extension?
-hh

2007-08-09, 6:56 pm

Hi,

Looks like this has been a tough question that hasn't gotten answers
in the past. Archives have a couple of examples of folks who appear
to have been looking for the same thing that I am.

The source is an old file that has been on a Unix system, vintage
1994, probably BSD flavor.

Utlity's purpose was to make files smaller; the name "compact" sounds
very familiar. The source was a text file (not a C/C++ source) that
had no extension, but after running of the file size reduction utility
received a suffix of ".C" (might have been ".c"?).

This sounds similar to a .tar or .z but apparently, it doesn't use the
same compression methods. One of the previous posters offered this
clue:

" [I have already checked that they are not from the V.42bis style
"compact"
mentioned in the comp.compression FAQ Subject #11.]"

....which I'm assuming to be also the case here.


A prior suggestion was to post a hex-dump of the first 256 bytes of
one of the
files, as that might help to identify the format. I'll look to doing
that next, if there's anyone interested in this challenge, assuming
that no one knows of an easier solution (ie, "product X" does what you
need).


TIA,


-hh

Thomas Pornin

2007-08-09, 6:56 pm

According to -hh <recscuba_google@huntzinger.com>:
> Looks like this has been a tough question that hasn't gotten answers
> in the past. Archives have a couple of examples of folks who appear
> to have been looking for the same thing that I am.
>
> The source is an old file that has been on a Unix system, vintage
> 1994, probably BSD flavor.
>
> Utlity's purpose was to make files smaller; the name "compact" sounds
> very familiar. The source was a text file (not a C/C++ source) that
> had no extension, but after running of the file size reduction utility
> received a suffix of ".C" (might have been ".c"?).


Some googling showed me this:

http://docs.hp.com/en/B2355-90128/compact.1.html

which apparently means that HP/UX has the utility you are looking for.
HP/UX is not (and has never been, as far as I know) opensource, hence
there is no available source code. The "compact" utility is there
documented as having been authored by "Colin L. Mc Master".

Some more googling sent me to:

http://www.mcmaster.org/index.shtml

which is the homepage for someone called "Colin L. Mc Master", but
I don't know if that is the same guy than above.

I also got this:

http://cgi.sover.net/cgi-bin/bsdi-m...ion=1&apropos=0

which is the man page for "compact" as it were in BSD/386, aka BSD/OS,
aka BSDI. That one was not opensource either. And it has now
disappeared. 386BSD was supposed to have inherited much of BSD/386, and
386BSD morphed into NetBSD and FreeBSD at some time around 1993. Modern
FreeBSD and NetBSD do not have "compact".


Now it so happens that HP maintains a few systems online with public
shell access; you just need to register at www.testdrive.hp.com, and it
is free. Some of the machines run under HP/UX. I just tried: at least
the td192.testdrive.hp.com machine (running HP-UX 11i 11.1) has the
"compact" utility, and, more importantly, the "uncompact" utility.


--Thomas Pornin
Phil Carmody

2007-08-09, 6:56 pm

Thomas Pornin <pornin@bolet.org> writes:[color=darkred]
> According to -hh <recscuba_google@huntzinger.com>:

First of all - great googling, Thomas. Without your inspiring
charge, I'd not have been motivated to get googling myself.

Is this the sucker:

http://www.tuhs.org/Archive/PDP-11/...rc/old/compact/

<<<
Index of /Archive/PDP-11/Trees/2.11BSD/usr/src/old/compact

Icon Name Last modified Size
[TXT] Makefile 26-Aug-1984 17:03 466
[ ] ccat.sh 12-Feb-1983 06:00 88
[TXT] compact.c 26-Aug-1984 16:48 6.7K
[TXT] compact.h 26-Aug-1984 16:48 958
[TXT] tree.c 26-Aug-1984 16:48 2.9K
[TXT] uncompact.c 26-Aug-1984 16:48 4.7K[color=darkred]

?

Phil
--
Dear aunt, let's set so double the killer delete select all.
-- Microsoft voice recognition live demonstration
-hh

2007-08-09, 6:56 pm

Thanks for the replies so far. Hopefully, I'll get some time tonight
to try them against this "tough nut to crack".


-hh

Matt Mahoney

2007-08-15, 3:56 am

On Aug 9, 5:04 pm, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> Thomas Pornin <por...@bolet.org> writes:
>
> First of all - great googling, Thomas. Without your inspiring
> charge, I'd not have been motivated to get googling myself.
>
> Is this the sucker:
>
> http://www.tuhs.org/Archive/PDP-11/...rc/old/compact/
>
> <<<
> Index of /Archive/PDP-11/Trees/2.11BSD/usr/src/old/compact
>
> Icon Name Last modified Size
> [TXT] Makefile 26-Aug-1984 17:03 466
> [ ] ccat.sh 12-Feb-1983 06:00 88
> [TXT] compact.c 26-Aug-1984 16:48 6.7K
> [TXT] compact.h 26-Aug-1984 16:48 958
> [TXT] tree.c 26-Aug-1984 16:48 2.9K
> [TXT] uncompact.c 26-Aug-1984 16:48 4.7K
>
>
>
> ?
>
> Phil
> --
> Dear aunt, let's set so double the killer delete select all.
> -- Microsoft voice recognition live demonstration


I think this is the oldest program I have tested (1979).
http://cs.fit.edu/~mmahoney/compression/text.html#6483

-- Matt Mahoney

Matt Mahoney

2007-08-15, 6:56 pm

On Aug 15, 4:10 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> Matt Mahoney <matmaho...@yahoo.com> writes:
>
>
>
> Are there any reference implemenations of the early LZ*s that would
> pre-date that? Does Mark's book contain the original reference
> implementations, or modern reworkings?


Good question. Limpel-Ziv's paper was in 1976, Huffman's in 1953.
Compression dates at least back to Morse Code in the early 1840's.
The problem would be finding implementations you could run on modern
computers. Compact was written in K&R C and was easy to port to
Ubuntu/g++ with only minor changes. It has some dependencies on
endian-ness and word size, and a couple of obvious compiler errors and
missing headers. But C was only widely used since about 1978.
Anything older than that would likely be written in either FORTRAN or
a dead language, most likely assembler for some ancient hardware.
Compact appears to have been written for the VAX/PDP11 and Sun (68000
I believe) under UNIX. This was before the original IBM PC or the
Commodore 64.

I think the next oldest program I have tested is UNIX compress
(1990). I had looked at Ross William's code http://www.ross.net/compression/
from 1991.
It is in C, but the code is in the form of subroutines, not complete
programs. I could write a wrapper I suppose. His SAKDC is older but
that code is gone.

-- Matt Mahoney

markn@ieee.org

2007-08-16, 7:56 am

On Aug 15, 3:10 am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> Are there any reference implemenations of the early LZ*s that would
> pre-date that? Does Mark's book contain the original reference
> implementations, or modern reworkings?
>


I haven't tried to trace the family line all the way back to find the
true first implementations, but the first well known LZ78
implementation seems to be compress, which sprang Athena-like into
being from the brow of Spencer Thomas in 1984. There is a WIkipedia
entry on it that has been worked up pretty thoroughly, judging by the
history:

http://en.wikipedia.org/wiki/Compress

(The article could use a minor update on LZW patent status from anyone
who has some time to do a public service.)

If the timing in the article is correct, it seems likely compress was
probably the first public implementation with decent quality. You
would think Ziv and Lempel would have at least had test code, but a
recent Q/A didn't give away any secrets in that area:
http://marknelson.us/2007/07/13/lempel-award/.

The first good LZ77 implementations I remember seeing came from
Haruhiko Okumura, who produced LZSS.C, LZHUF.C, and LZARI.C. These are
all from around 1989.

http://www.programmersheaven.com/do...0/download.aspx

Since LZSS had been around for at least five years at that point,
there is undoubtedly some code floating around that predates it, but I
don't know of anything public. I have always suspected, but have no
concrete proof, that seeing that code at just the right time jump-
started Phil Katz's conversion from the SEA LZW-based ARC format to
the deflate-based ZIP format.

The code in my book was either original or reworked. Most programs
like compact.c and LZSS.C are very old-school C (C written as assembly
language) and are totally incomprehensible. I usually tried to
dispense with efficiency in favor of readability, for better or worse.

|
| Mark Nelson - http://marknelson.us
|

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com