For Programmers: Free Programming Magazines  


Home > Archive > Cobol > January 2006 > Re: Fujitsu NetCOBOL for .NET









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Fujitsu NetCOBOL for .NET
Richard

2006-01-20, 9:55 pm

> ..3 bytes .. The resulting file was 115 bytes.

File header information of 112 bytes.

> Next I took an uncompressible file (a zip file) of 6550 bytes and zipped it.
> The result was 6662.


File header information of 112 bytes.

> text file of 69 bytes. I zipped it. The result was 176 bytes.


176 - 112 -> 64. The 69 bytes were compressed to 64, probably using
RLE.

With typical Cobol record data there are large runs of spaces and zero
characters because the fields must allow for the largest expected data
item. This type of data can be compressed much smaller than 'text'
that has little white space.

Zip chooses the 'best' compression to use for a particular type of
data. For small text files it may choose RLE, for large english text it
may choose to use a dictionary based mechism that has a large table but
very small resulting data.

Michael Wojcik

2006-01-22, 9:55 pm


In article <1137783996.596533.80170@f14g2000cwb.googlegroups.com>, "Richard" <riplin@Azonic.co.nz> writes:
>
>
> 176 - 112 -> 64. The 69 bytes were compressed to 64, probably using
> RLE.


With a 69-byte "text" file, I doubt you'd get much compression, if
any, from RLE; plain ASCII text (which I assume is what we're dealing
with here) rarely has much in the way of runs (of whatever symbol
length). That looks like the result of Deflate (probably straight
LZ77) to me.

I just did a bit of testing of my own, and Info-Zip and WinZip both
compressed a 69-byte text file to 64 bytes using Deflate (defF, defN,
or defX, depending on the "fast", "normal", or "max" options).

Not that it matters...

> With typical Cobol record data there are large runs of spaces and zero
> characters because the fields must allow for the largest expected data
> item. This type of data can be compressed much smaller than 'text'
> that has little white space.


Right, and here RLE often works very well, which is why we include a
handful of RLE compressors (for 8-bit and 16-bit symbols, and for
smaller or larger inputs) with MF COBOL.

> Zip chooses the 'best' compression to use for a particular type of
> data. For small text files it may choose RLE, for large english text it
> may choose to use a dictionary based mechism that has a large table but
> very small resulting data.


Actually, LZ77 doesn't require much table space. The table (or tables,
since LZ77 will generate a new one when the compression ratio drops as
it's processing the data stream) is a Huffman table that maps output
codes to offsets in previous input data. That is, the compressed
symbols in LZ77 output refer back to earlier data: "this next piece is
the same as the 16-bit sequence we saw 124 bits ago". So most of the
information in the table is actually stored in the user data itself.

And the table itself is compressed using a fixed Huffman encoding.

--
Michael Wojcik michael.wojcik@microfocus.com

Sure we're tossing out fluff, but tell me, where does anyone deal in words
with substance? -- Haruki Murakami (trans Alfred Birnbaum)
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com